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I. Summary 

The objective of this study was to review and integrate the various 
methodologies used in the.study of individual growth (especially 
academic growth); j.This was accomplished by means of Jbreskog's general 
'model for the analysis of covariance structures, i.e., each of the 
disparate methodologies available from the literature was shown to be 
a special case ofrijo'reskog's general model. Two .general considerations 
enter into the study of growth and its determinants: (a) making 
provision for errors of measurement and (b) constructing a model which 
relates growth to its determinants in a causally meaningful way. 
Errors of measurement typically involve questions about the reliability 
and/or validity of measures, i.e., only indirect measures of the 
desired variable (construct) \are available. Multiple measures of each 
construct would appear necessary to deal with measurement errors in a 
quantitative manner. For this purpose the multitrait-multimethod ap- 
proach devised by Campbell and Fiske (1959) is a useful approach since 
in principle it allows for correlated errors of measurement. Because 
the Campbell-Fiske approach does not specify the exact relationships 
between observed variables and constructs, a factor analytic formulation 
of their approach was used in order to summarize various approaches to 
measurement error. The constructs, which represent the growth variable 
and its .determinants , were then interrelated in terms of a linear 
structural (causal) model. The implications of this model, which 
itself is a special case of Joreskog's general model, wer<a considered. 



II. Introduction 



This project was a review and synthesis of educational measurement 
methodologies for studying growth. To this end the initial phases 
consisted of a review of relevant literature in econometrics, 
psychometrics , statistics and sociometry. Some of the concepts which 
developed from this review seemed worthy of immediate • dissemination 
via formal and informal publication media. In particular the following 
articles commented on separate aspects of our review: 

Werts, Charles E., Joreskog, Karl G. , & Linn, Robert L. Comment 
on "The estimation of measurement error in panel data. 11 
American Sociological Review , 1971, 36, 110-113. 

Werts, Charles E., & Linn, Robert L. Comment on Boyle's "Path 
Analysis and Ordinal Data. 11 American Journal of Sociology, 
1971, 76, 1109-1112. 

Werts, Charles E. , & Linn, Robert L. Errata to the Werts-Linn 
Comments on Boyle's "Path Analysis and Ordinal Data." 
American Journal of Sociology , 1972, in press. 

Werts, Charles E. , Linn, Robert L. , & Joreskog, Karl G. Another 
perspective on "Linear regression, structural relations, 
and measurement error. 11 Educational and Psychological 
Measurement , in press. 

Werts, Charles E.*, Linn, Robert L., & Joreskog, Karl G. A 
congeneric model for platonic true scores. Research 
Bulletin 7i-22, Educational Testing Service, Princeton, 
New Jersey, May 1971. Also in Educational and 
Psychological Measurement , in press. 

Werts, Charles E. , & Linn, Robert L. Estimating true scores 
using group membership. Educational and Psychological 
fteash-rsment , in press. 

Linn, Robert L. , & Werts , Charles E. Errors of inference due 
to errors pf measurement Research Bulletin 71-7, 
Educational Testing Service, Princeton, New Jersey, 
February 1971. Also in Educational and Psychological- 
Measurement , in press. 

Werts, Charles E., JtJreskog, Karl G. , & Linn,_ Robert L. 
Identification and estimation in path analysis with 
unmeasured variables. 'Research Bulletin 71-39, 
Educational Testing Service, Princeton, New Jersey, 
June 1971. Also in American Journal of Sociology . 
. in press. ^ 
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Werts, Charles E, , Linn, Robert L. , & Joreskog, K. G. Intraclass 
reliability estimates: testing structural assumptions. 
Educational and Psychological Measurement , in press. 

Copies of these articles are included in the Appendix. Those aspects 
directly relevant to the project goal^are treated in the review 
sections which follow. 

For heuristic purposes the review and synthesis of the literature 
has been treated in two parts. The first part (Sec. Ill) "labelled 
"Quantifying Unmeasured Variables 11 treats the general methodological 
considerations relevant to g.rowth studies and a wide variety of the 
problems involving errors of measurement and causal analyses. This 
part will appear in a new book, Theories and strategies of measurement 
in the social sciences , H. M. Blalock, editor. Blalock's books are 
widely used in the social sciences as textbooks. 

The second part of our review (>ec. IV) labelled "A raultitrait- 
multimethod model for studying growth" reviews various psychometric 
formulations specifically relevant to growth studies and formally 
treats them as a special case of Joreskog ! s genera], model for the 
analysis of covariance. Implications for factor analytic studies of 
growth data and for studies of the determinants of growth are 
detailed. This part will appear in Educational and P sychological 
Measurement and has been released in preliminary form using the 
Educational Testing Service Research Bulletin series. 



-3- 



8 



III. General Methodological Considerations ; Quaiitifying Unmeasured 
Variables s **— 



Social scientists frequently wish to make inferences about the 
"effects" of hypothetical constructs which are not directly measured, 
e.g., only the symptoms, antecedents, and/or consequences of the 
construct may be measurable. In recent years a variety of statistical 
procedures have been introduced to help quantify the relationships 
among observed variables and constructs in an attempt to increase 
the rigor and validity of such inferences. The purpose of this essay 
is to introduce the various concepts and to consider the numerous 
assumptions involved in these procedures so that the user will be 
aware of analytical potentials and limitations. 

1. Validity 

A basic concept in the discussion of ^ndin ectlv measured concepts 
is that of validity . This refers to the relationship between- an 
observed variable (X) and the unmeasured construct (Y) . We shall 
discuss models in which it is assumed that the relationship is 
linear, i.e., f 

(..) X = bY + I •+ e 

wher^ b is the slope of the regression of X ott Y, I is the inter- 
cept of this regression line, and e is a residual which is taken to 
be independent of Y . Econometricians (e.g. , Goldberger, 1970) 
typically specify b = 0 ,1 = 0, and e is labelled a disturbance 
instead of the psychometric term errors of measurement . Despite the 
crucial importance of this linear relationship, it is seldom that 
data analysts substantively justify this assumption. For example, 
. ability and achievement test scores are generally assumed to have a 
linear relationship with their underlying true scores, however 
Carver ,(1969) has persuasively argued that there is a curvilinear 
relationship between knowledge (the construct) and test scores in 
classroom learning, i.e. , more knowledge is required to increase the 
test score one point at the high end of the scale. When psychologists 
use the term validity coef ficient they are usually referring to the 
correlation (i.e., R^) between the observed variable and. the 

construct (i.e., true score) assuming the residuals of X on Y to 
be independent of Y (Guilford, 1954, Chap. 14). As long as t 
consideration is limited to a single variable X and a single 
construct Y the linear relationship is not a real limitation, unless 
an added constraint such as equal intervals is added, because the Y ^ 
could be transformed to yield a linear relationship with X . With 
two X's for a" single construct the limitation becomes a real one. 

It is useful to distinguish between the terms reliability and 
validity. A traditional test theorist will topically consider the 
• correlation between paralle l forms (X- and X 0 ) of a test to be 



the reliability coefficient . As illustrated in Fig. l.a, the model 
here is X = b Y + I ± + e 1 and X 2 - + I 2 + e 2 where e ± and 

e 2 are assumed independent of each other and of Y ; which implies 
♦•hat R = K = R „ = 0 • Test forms are said to be parallel 

e l e 2 e l Y 6 2 Y 
when the variances of e. and e 0 are equal (i.e., V - V ) , 

12 e l e 2 

b 1 = b 2 and I 1 = I 2 • lt follows that for P arallel ^rms the 
correlation between the observed measures will equal the square root 
of the correlation of either measure with the construct, i.e.., 

^ m v/ ^xT = *X X =reliabi:it y coefficient. If the variable 

wl'ich is being measured by the parallel forms (i.e. , Y) is itself a 
symptom of another construct (e.g., Z) then new assumptions must be 
made, e.g., Y = bZ + y where y is independent of Z , e^^ and e 2 

as shown in Fig. lb. In this case the correlations between 
parallel observed -measures and Z are 7 = z ~ **YZ Y = 
o 1 2 ■ 1 

*YZ *X Y = **YZ X X * In thiS model the X l 0n Z residuals have 
2 12 

the form (X ± - b i bZ) = b^y + e i and the covariance between the X 1 
and X 2 on Z residuals will equal b 1 b 2 . Therefore these 

residuals are in general correlated and Y f^L „ or vK, 7 

*1 X 2 2 V 2 - 

R and L 7 cannot be estimated, however R^ is the upper 
\ z X 2 Z a x a 2 

limit for these correlations ," i. 6. , reliability sets an upper bound 
on validity. For illustrative purposes consider the problem of 
measuring achievement in mathematics for 9th grade students in city 
A. Two (or more) parallel forms of widely used mathematics tests, 
standardized on national samples, can be readily obtained and 
administered. These forms typically have very similar item formats, 
the items differing mainly with respect to the numbers inserted in 
the problems. Because these tests cater to a wide variety 
of schools the items necessarily cover material which is common to 
most curricula at this level. Insofar as the curriculum in city 
A has special emphasis, not generally taught elsewhere, the 
* nationally standardized tests will be partly irrelevant (i.e., 
invalid) to city A. The parallel forms would correspond to X ± and 

X.- in Fig. l.b, the variable Y would represent achievement on 
generally taught problems , and Z would be the achievement of students 
in city A. If the discrepancy between Y anc Z is very grea^, as 
inferred from curricular differences, then city A could build 
equivalent forms, which more precisely cover their coursework, which 
ight then correspond to 1 the model in Fig. I.e. It is always necessary 
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for the researcher to examine test materials in order to see how well 
the construct being measured by that test corresponds to the construct 
relevant to the research project. In many cases he may decide to use 
two measures of a construct with very different types of item formats 
in order to obtain a model like Fig. l.c, i.e., the very similarity of 
item formats may give the scores .some covariation which does not 
represent association due to the underlying construct to be measured 
(as in Fig. l.b) . 

Instead of validity coefficients, factor analysts (e.g., Harman, 
1967) refer to factor loadings . A factor loading is the regression 
weight of an observed score on a factor (viz., construct). The models 
in Fig. l.a and l.c correspond to a single factor model and the 
standardized factor loading is equal to the correlation of the observed 
score with the factor like the corresponding reliability and validity 
coefficients. If there were more than one factor, but these factors 
were uncorrelated as in an orthogonal aolutioii , then the standardized 
factor loading would still equal the correlation. In the case of 
correlated factors as in an c blique solution , the standardized factor 
loadings are standardized partial regression weights which are 
called pach c oefficie nts by path analysts .(e.g. , Duncan, 1966; 
Wright, 1934). 

The regression w>i£lit in Equation (1) basically states the 
relationship between 'the units of measurement of the observed 
variable and that of the construct. A weight equal to unity corre- 
sponds to the assumption that the observed measure and the construct 
have the same units of measurement. Psychological test theorists 
and econometricians usually make this assumption, whereas path 
(Blalock, 1969; Costner, 1969) and factor analysts commonly assign 
the factor a variance of unity (i.e., V y = 1). As shall be noted 

later, this assumption creates no difficulty until the problem 
involves multiple measures of a construct and/or growth along the 
same dimension over time (Werts, Joreskog, & Linn, 1972). 

2. Multiple Measures of a Single Construc t 

Although econometricians rarely are concerned with. multiple 
. measures of a construct, test theorists and path and factor analysts 
have written extensively on this topic. Much of modern test theory 
(Lord & Noyick, 1968) is derived , assuming at least two tau equivalent 
measures of the underlying true score (i.e.,, construct). Tau 
^' equivalent measures (e.g., \ and X 2 ) are those in which the 

observed on true regression weights are unity (i.e., b 1 = b 2 = 1) , 

the intercepts are eq^al (i.e., ^ = l£ and the errors of 

measurement are independent of each other and of the true score. 

Essentially tau equivalent measures are the same except that I 1 f * 2 ■ 

" ■ - _ i **" 

•-In contrast to the parallel forms assumptions discussed previously, 
the error variii-Wis ave not assumed equal (i.e., V f V ) for tau 

e l e 2 



equivalent or essentially tau equivalent measures, which means that 
the tests may have different reliabilities (i.e. 9 differing error 
variances). Since by assumption X^ = Y +.1^ + e ^ and ^2 ~ Y + ^2 + 

e 0 , the covariance C v * V , i.e., the covariance between the 

2 12 
observed scores is equal to the variance of the true scores. The true 
variance divided by the observed variance (e.g., V ) for a test 

X i 

yields the reliability, i.e., V y * V x * 

Essentially tau equivalent and tau equivalent measures assume that 
the observed measures of the construct have the same units of measure- 
ment. When measuring different symptoms or indicators of an under- 
lying construct it is quite common to have different units, e.g., 
income and occupation as indicators of socioeconomic status typically 
are measured in different units. In this case the unit of the 
construct is arbitrary and. is usually fixed by assigning a variance 
of unity, although it is also possible to identify the unit of one of 
the measures with that of the construct by specifying the corresponding 
regression weight to be unity. Joreskog (1971) calls the various 
measures of the construct congeneric measures 0^ t £ b ± ) , whereas 

factor analysts would say that a single factor structure has been 
assumed. In each case the errors or residuals are assumed independent 
of each other and of the construct. 



3. Identification 

The concept of identification is crucial to any comparison of 
methods. Mathematicians and econometricians (e.g., Fisher, 1966) 
have long been interested in developing procedures for dealing with 
identification problems. Whereas true score theorists and path 
analysts usually attempt to build identified models, the majority of , 
factor analysts have dealt with highly underidentif ied models. 
Although in principle sociologists were exposed to the identification 
issue in relation to latent structure analysis (e.g., Lazarsfeld, 
1950), the recent papers on this subject by path analysts (e.g., 
Boudon, 1965; Blalock, 1966) have probably had a wider impact. The 
term identifiable will be used here in the sense defined by Fisher 
(1966, p. 25): "We shall jpeak of that equation as identifiable 
(or identified) if there exists some combination of prior and posterior 
information which will enable us to distinguish its parameters from 
those of any other equation in the same form." 

To illustrate the identification problem let us consider a single 
factor model from the perspective of path analysis (Costner, 1969). 
Suppose we are given four observed measures . (X^ , , 3^ , X^) of 

the factor @J) • The single factor model specifies that = N 

b.Y + I. + e. where ail e. are independent of each other and of Y . 



The model is depicted in Figure 2 using path analysis notation, 
i.e., when variables are independent no arrows connect them* To 
obtain the expected covariances (C^) between two observed measures 

(X ± and Xj) ve would multiply the corresponding pair of equations 
to v otttain: 



(2) 



(3) and 



V v = b?V v + V 
X ± i Y % e ± 



The term expected refers to the value of a parameter to be expected in 
a model without sampling or model specification errors. Specification 
e rrors refer to the incorrect choice of a statistical model (Theil, 
1937). It is convenient to arrange the expected variances and co- 
variances given by equations (2) and (3) into an expected variance- 
covariance matrix (£) , e.g., in the four variable case: 



Y. = 



V 


C 12 


C 13 




C 12 


V 2 


C 23 


C 24 


C 13 


C 23 


V 3 


C 34 


I' 14 


C 24 


C 34 


V 4 



To see if this model is identified, 
1969) typically would standardize a! 

V * V = 1) and then derive the i 
X 4 Y 

correlation (^j) in terms of the 
model, e.g . , 



and 



R 12 = 


* * 

b l b 2 


"IS" 


* * 

b l b 3 




* * 

b l b 4 


R 23 = 


* * 

b 2 b 3 


R 24 = 


* * 

b 2 b 4 




* * 


R 34 = 









the path analyst (e.g., Costner. 
1 variables (V v - V - V - 

X l X 2 *3 
iquations for each expected 

path coefficients (b ) of the 
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Using any three measures (X i , X.. , and X^) it is possible to 

solve for the unknown (bj) 2 = (R ± jR ik > f R jk • Thus a11 Parameters 

(b*) are identified, in the sense that each parameter may be stated 

as a function of potentially observable information. The actually 
observed sample variances and covariances could also be arranged in 
a matrix (S) . The observed matrix (S) may differ from the 
expected matrix (E) because of sampling and specification errors. 
The model is usually judged to be incorrect if E and S differ 
very much, i.e., when the observed data does not fijt the model. 
Quite sophisticated techniques are now available to obtain parameter 
estimates which minimize in some sense the difference between the 
observed matrix and the expected matrix computed from the parameter 
estimates (Hauser & Goldberger, 1970; Joreskog, 1970). 

The equations relating the expected correlations (R^) to the 

model parameters (b*) are called path equations by path analysts 

When the parameters are identified by these equations, a model is 
called just identified if the number of observable quantities (R^) 

equals the number of unknown parameters (b^) in the path equations 

and overidentified if the observables exceed the parameters. If the 
number of unknown parameters exceeds the number of observables, then 
the model is underidentif ied even though a subset of the parameters 
may be identified. 

Joreskog labels models which are nwr-r rfgnt i fied as. confirmatory. 
In confirmatory factor studies the experimenter has already obtained 
a ^rtain amount of knowledge about the variables measured and 
therefore As in a position to formulate a model which is to be tested 
fcr fit tcj data. Most factor analysts deal with highly under- 
identified models; exploratory factor procedures being used to 
suggest an appropriate number of factors to use and a preliminary 
interpretation of the data. In contrast, econometricians , path 
analysts, and classical test theorists usually deal with identified 
models vhich reflect substantive theoretical considerations. It is 
logically possible for the model suggested by exploratory procedures 
to be identified, but factor analysts have typically not examined 
this question because their main interest is in fit, not in 
i'fentifiability. , % 

4. Multif actor Models 

Let us consider a simple two factor (Y^ and Y 2 ) model 
(Fig. 3) in which there is only one observed measure^ and X 2 > 

of each factor, i.e. , Xj_ = bfa + ^ + e ± and Xo^2 Y 2 + *2 + e 2 

-11- 
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where e 1 and e 2 are independent of each other and Y 1 and Y 2 . 

When all variables are standardized there is one observed correlation 

* 

(R, _) and three unknown correlations (R„ v = b- , Ry Y > and 
12' ± 

R^ = b*) among variables (i.e., the model is underidentif ied) 

and R, =L V v R- v • Psychometricians call the correlation 

12 *1 1, 12 2 2 
between the factors (R, Y ) the unattenuated correlation. In the 

12 

case of tests, the publisher usually provides test reliabilities 
(labelled R n and R 22 > which in this model might be used to 

estimate (denoted by " A ") the square of the correlation with the 

appropriate factor, i.e., R n R 2 /^ and R 22 = • Given these 

reliabilities we may estimate the correlation between factors as: 



* / R ll R 2 

This procedure is called correcting for ' a ttenuation. 



\ Y 2 = h2 *" 



A. Exact Functional Relationship Among Factors 

Statisticians (e.g., Kendall ft Stuart, 1961) and econometri'cians 
(e.g., Johnston, 1963, Chap. 6) have been interested in the variation 
of the Fig. 3 model in which b, = b 2 - 1 and the factors have an 

exa ct functional relationship , i.e., = I +.BY. and R„ y = 1 . 

..; ' 2 .X J 12 

It might for example be hypothesized that in a cl^ssyof equally 
intelligent and motivated students, the amount (they/will learn in a 
math course (Y 2 )^-wl'll be directly proportional-to their relevant : 
mathematics skills (Y^ at the beginning of the course because , e.g., 
those who know more are better able to understand the teacher. 

Neglecting variable , means , since there are three unknown 
parameters , b 2 , B) and only one observed correction (R 12 ) , 

this model is underidentif ied. Isaac (1970) reviews ytfie estimating 
formulae for the case in which the error variances and/or 

V or their ratio V * V are known. 
e 2 6 1 6 2 
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B. Stochastic Components 

t Vi c t-or, M963 n 148) notes that 'the exact functional relation- 

?r^lch it is generally necessary in linear structural models to 
assume that all the other unmeasured variables influencing a variable 
ST^ere" are independent °f the influences that are -asured 
TRIalock 1967). It seems most unlikely, tor example, 
not other disturbing factors which will influence mathematics 
achievement . 

Adding a stochastic disturbance term, V , representing these 
other variables, the equation between the factors becomes Y ? - + 
6 Y + v where y io independent of Yj_ and bj_ = b 2 = 1 . The 
analysis of this stochastic model is discussed, by Johnston (1963, 
Cntp 6). One approach assumes that the error variances V # . and 

V are known, which is equivalent to the psychometrician' s approach 

and V = V - fy • The difficulty with this approach lies 

Even when reliabilities are given j fl arpf» for the 

tests, these figures may be erroneous to an unknown degree tor 
particular subpopulation being tested. 

a «*w annroach is the use of i inst rumental variables , i.e., in 
e and e . In this case the regression weight may be 
estimated as 5 = cov (Y^Z) * cov (Y^) . It may be shown that*, 
reliability coefficient for ^ is Rj_ L - ^ Z \x 2 * ^Z ' W 

from the previous section can be seen as the solution for the squared 
factor loading in the single factor model in which ^ , X,, , ana 

are rnneeneric measures = ^X^Z * ^Z and 

D 2 - p R • * R. "1 . Further analysis would show that V 
\z~\z\z \* 2 \ \ 2 t 

V are not identified. The basic^problem in use of instrumental 

11 



and . y 
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variables is that we are seldom in a position to check whether this 
variable is in fact independent of errors, yet the estimates are 
likely to be highly dependent on which such variable is selected 
(Blalock, Wells, & Carter, 1970). The same problem plagues the use 
of the congeneric model since it is seldom obvious exactly which 
observed measures really are indicators of the same underlying 
trait assuming independent errors. . It is interesting to note that 
in these models an instrumental variable substitutes for a congeneric 
measure, i.e., what is needed is a third measure which is independent 
of the errors in the other two variables. For illustrative purposes 
consider the problem of measuring differential student math achieve- 
ment given the scores from two different nationally distributed 
objective exams, one perhaps using a problem format and another a 
multiple choice format; whose validities for the curriculum of 
interest are unknown. A third congeneric measure might well be the. 
course grades given by the teacher. The logic here is that these 
should all be tapping the achievement dimension but to differing 
degrees and there is no a priori reason "to believe that errors of 
measurement among these measures are correlated since very 
different formats are involved. Sometimes, however, achievement 
tests are given in batteries such that the needed third measure 
might be in another content area. For example, English achievement 
• scores might be available. It is unlikely that this test is 
correlated with errors of measurement on the two objective math 
tests and this could therefore serve as an instrumental variable. ^ 

C. Model with Multiple Indicators 

Economists (e.g., Goldbcrger, 1970) and sociologists (e.g., 
Blalock, 1969; Costner, 1969) rarely have the data to estimate 
reliability from independent sources, whereas psychometricians and. 
factor analysts (at least implicitly) frequently do so. A 
traditional technique of this type used by psychometricians is the 
split half procedure (e.g., Guilford, 1954, p. 377) . The items on 
a test are split in half (e.g., odd items assigned to one -half and 
even to the other) and the correlation betwean the halves, used to 
estimate the reliability of the whole test, assuming tha, the halves 
are equivalent measures. Various formulae are used to a.Ijust for the 
fact that the halves are not as long as the whole test and therefore 
not as reliable (Guilford, 1954, Chap. 14). These reliability 
estimates may then be used to estimate the unattenuatfcd correlation 
between two tests, i.e.., the correlation between the two true factors 
underlying the observed measures . 
'<•'# 

The logic of the split half approach is worth further study. 
Changing to a double subscript for each observed measure (X ) 
where j refers to the j th construct (Y^ and i to the 
i th indicator of the j th construct; then in the split half 
procedure the equations are: 
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x ll 


" b ll Y l 


+ 




e ll 


X 21 


- b 21 Y l 


+ 


hi + 


±21 


X 12 


= b 12 Y 2 


+ 


I 12 + 


e 12 


X 22 


= b 22 Y 2 


+ 


X 22 + 


e 22 



* * 

x 12 = b n R Y 1 Y 2 b *i2 



Using path analytic procedure we find that: 



* * 

0 X 00 = b l2 b 22 
2 22 



Solution of these equations indicates that all the reliabilities 
(b* ) and the unattenuated correlation (R^) are identified 
wiAout further assumptions. ' This model is overidentified since 

, *^ u* T** - h V = V , and V = V 
to believe that b u = b n , b 12 - t> 22 » e ^ ^ &n 

as asserted la the assumption that ^^>,alvis .^parallel (WerW^ 
"""A "."Vd ST«t TK M&JTSS.^- 3") notes. 
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administration of two different tests or between test administration 
and measurement of some criterion in validation 11 (Guilford, 1954, 
p. 377). If these other determiners were independent of the true 
score then in our model these would be equivalent to asserting that 
the "corresponding errors were not in fact independent (e.g., 
R J* 0 ). If this were the case, it is possible that this 

e ll e 21 

might be detected as a lack of model fit to the observed data. 
Psychometricians have various other procedures for estimating whole 
test reliability from item data (Stanley, 1971), the logic being much 
like that discussed here except that each item now becomes an 
observed measure. To the degree that the item data do not fit a 
single factor model these estimates become difficult to interpret 
(Werts & Linn, 1970a). Nonetheless, in practice this fit is seldom 
checked. 



5. The Multitrait-Multimethod Approach 

the multitrait-multimethod matrix technique (Campbell & Fiske, 
1959) Has been of considerable interest to psychologists because 
it provides information on the convergent (confirmation by independent 
measurement procedures) and discriminant (separation cf one trait 
from another) validity of theoretical constructs (i.e., traits). The 
problem of measuring mathematics achievement as opposed. to achievement 
in English may be used to illustrate these concepts. To measure 
math achievement we might use three measures including one "subject ive !! 
measure, course grades, and two "objective 11 measures '"cons is ting of a 
multiple choice and a mathematics reasoning test (perhaps con- 
structed by the publisher of the course material). Despite the 
differences in format, each measure in principle is simply another 
demonstration of the student's grasp of the subject matter and should 
therefore tend to give fairly consistent results. Insofar as the 
results are indeed consistent, convergent validity is demonstrated. 
The logic underlying convergent validity is much like that of the 
congeneric model previously discussed. The emphasis on different 
methods of measurement represents an attempt to ensure that the 
correlations among variables as much as possible represent commonality 
with the underlying trait rather than consistencies due to similarities 
of testing methods. Thus, use of different methods tends to support 
the assumption of independent errors required by the congeneric model. 
Now suppose that English achievement were also obtained from three 
measures whose format was ] ike that used for math achievement, i.e., 
course grades, a multiple choice and a reasoning test. Discriminant 
validity would be demonstrated if it could be shown that the trait 
(i.e., factor) underlying the math measures were distinctly different 
from the trait underlying the English measures. According to Campbell 
and Fi£ke, convergent validity is demonstrated by # at least moderate 
correlations between different methods measures of the same trait 
and discriminant validity is shown by a higher correlation between 
independent efforts (i.e., methods) to measure the same trait than 
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between measures designed to get at different traits using the same 
method. From our perspective discriminant validity consists of 
demonstrating that the true correlation between two traits is meaning- 
fully less than unity. Werts and Linn (197(Jb) have discussed the 
Campbell-Fiske approach from this perspective. The analytical 
procedures devised by Campbell and FIske (1959) are not of interest 
here because no attempt was* made to specify the nature of the re- 
lationship between the observed measures and the trait or methods 
factors. It should be clear from our previous statements that an 
observed variance-covariance matrix is interpretable ofily from the 
perspective of an hypothesized model. Campbell and Fiske's argument 
that the researcher should obtain measures of a trait which differ as 
much as possible in measurement technique, in order to improve con- 
vergent validity, is very pertinent. From the multitrait-multimethod . 
perspective the typical psychometric approach, which attempts to devise 
alternate forms with almost identical format, would be criticized 
as lacking in convergent validity. 

A variety of analytical methods have been proposed for uulti- 
trait-multimethod data (e.g., Boruch, Larkin, Wolins, & McKinney, 
1970), however only Jlireskog's confirmatory factor analytic approach 
will be considered here. Suppose that it were assumed that each 
observed measure were a function of only' one trait (Y^) and one 
method (M^) factor in a linear fashion,/ i .e. , . 

X jk = 3 jk Y j + + I jk + e jk 

where X,, = measure reflecting combination of trait j 

^ and method k 

a„ = regression weight of X . on trait Y , and 
jk -v JK J 

bj k - regression weight of Xj fc on method . 

Assume also that all residuals are independent of each other and of all 
factors. It may be shown that at least three traits and three methods 
must be used in order for this model to be identified, given that all 
factors may be oblique , i.e., correlated. To understand the connection 
with models discussed earlier, consider two different method measures 
of the same trait, e.g., X n and X 12 (illustrated in Fig. 4). It 

can be seen that there are several sources of the observed correlation 

\lh2 ' i * G " Xl*12 = """" + ^l 1 ^ 2 + A^ 1 + 

b* K b*, . If the methods factors were independent of the trait 
11 ^ 

factor, the model would in principle be like a congeneric model with 
correlated residuals. Such a model has been proposed by Guttman (1953) 
in relation to obtaining reliability estimates from nonindependent 
item data. If the methods factors were independent, we would have the 
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congeneric model basic to -ue score theory ^^f^ered 
traditional test theory approaches dl !^°*? roach y in which 

'the special case of the ^^^SS^ofSTot*" ° f the 
methods factors are assumed to ^ independent variance 

trait factors. The notion of ^ ab ^ ^the case where errors are 
to observed variance is only -^ n ^ n Veat partitioning of variance 
JfpSK S r ^•^trXmulti.ethod approach. 



functional g glatiosghigg Amon g Factors 



^rees the e.conometricians and .th analysts P^^.*^ 

SSTK ToS £ ^ 

theory focus only on errors of £ ^avoid making causal 

arises because "SoESS ^S*^ (i.e., causally 

inferences from correlations, some d to ingur e that 

prior) variables are •^ t ^*S$*T%^ systematic pro- 
« particular correlation is Mt SEH^jJ^ ^.g. , path analysis) 
cedures for analyzing sources of a correlation ^e.g , 
are viewed with suspicion. 

The function of causal ^pothese^ j can be illustrated by^ ^ 
example taken from Werts and Linn <" 70 ^' _ ^ p A ^. whp _ re J. is 



(Y 2 = BY-l + v) 5 where 



example catwu , , 1oo 

causal relationship between variables 

j v -fnd-lrectlv by two indicators 
measured directly and Y x indirectly oy 



<*1 ' b l Y l + 



X 2 = b 2 Y l + B 2 ) 



This is a single factor model and B* may be 



estimated as: 



F or example, if V? = ' 2 ° ■ V 2 = ^ * ^ = ^ 

*...„ . Most elationa^ 

effects, would not -~£%f% £ fame underlying construct (i.e., 
V /f £T^X?-21Sr-- session equation: 




Y 2 = b 1 X 1 + b 2 X 2 +I 3 + e 3 
b* = - .33 



and 



b* = +.67 



If for 
were 



yielding standardized weights of ^ 

„. „ nf fanultv with doctorates and 
example X, were proportion of faculty wit 

1 4 nhrarv it might well be supposed 

number of books per pupil in the library , it m g affluence (i.e. , 

that both of these variables are i^erto"^ ^ ^ Qf g< . 



,th of these variables are xnoic*uu,o — " f gcnool 
Certainly the regression procedure, which is typical 

- • " influences v_ . i. 



effects studies;, would yield no hint of how ^ 



i.e. , 
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•the weights b* and b* are opposite in sign, yet both reflect the 
same underlying variable. The use of regression 

represents an attempt to avoid theory, finding influences by seeing 
if a Taxable increases the percentage of predictable variance in the 
outcome. It is better to specify the theoretical structu " e b ^ n f ened 
postulated, so that appropriate analytical procedures may be designed. 



A. Growth Studies 



Another area where it is important to spec! .fy f un ^° a ^^^° n " 
shins is in the study of the determinants u< growth. Test prists 
have long been concerned with the problem of estimating * 
presence^ errors of measurement (e.g., Harris, 1963). f * 
feature of this area is that an initial status an J ".^Sfijgga. ^ 
assumed to have identical units of measurement. If the initial- status 
is X b 1 Y 1 + l 1 + e 1 and the final status X 2 - + A 2 1 * 

then the equal units assumption is equivalent to b 1 - b . V ^rious 
procedures'ce.g., Cronbach & Furby, 1970) ^\Xr^Tell^l±ty 
true change Y - Y, from the observed scores and known reliability 
coefficients for th& initial and final measures. From these data a 
measure of the »f differences, i.e. , the correlation of the 

observed difference' X 2 -\ with the true difference Y 2 ^ may 
nhtained It was originally thought that if the reliability <*f 
differences wallow then our ability to estimate true change wo ul be 
low; however, Cronbach and Furby (1970) and Werts and L.nn 
demonstrated' the use of Information on other variables to hel estlma te 
change. The logic of this approach is an extension of the rationale 

can be' usfd to estate model parameters and therefore to improve 
estimates of factor scores. 

Several educational researchers (Bloom 5 Thorndike, 1966) 

havo been concerned with the determinants of ^ - Y^ and in 
have argued that if the initial status (Y^ is uncorrected 

gain (Y, - Y,) then the determinants of change during this time 
interval are different from those which produced the initial level of 
competence (Y.) . No such conclusion is warranted (Werts , Joreskog, 
ft Linn 1972) since without including in the functional model various 
ete~s^f growth, it is impossible to make -y statements about 
the effect of these determinants. As the path analysts have so 
rtq^fnfly" shown, no correlation, even zero, is ^f*"^ 1 * ^ 
caudal sense except in the framework of a causal model It is quite 
possible because of counterbalancing influences, for Y 2 ^ to 
uncorrected with Y ± and yet initial status may influence gain 
either positively or negatively. £ 
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An important- feature of growth studies is that the variance of 
the initial and final status factor (Yj and Y 2 ) is identifiedr 
by the scaling assumption" b 1 = b£ . For convenience test theorists 
usually assign the value = b ? = 1 , i.e., that the factors have" 
the same units as the observed measures, the variance of the factors 
then being determined by the known reliabilities. In the typical 
achievement study the true variance increases over time (V > V y ' ' 

e.g.-, because some students will pursue the study of mathematics 
^whereas others will avoid advanced courses. The usual factor and 
path analysis approach of standardizing all factors (e.g. , Vy = V y ) 

is clearly unsatisfactory for growth studies because it ignores 
changes in true variance. Even if there were no errors of measurement, 
standardization of variables is undesirable in growth studies. 
Psychometricians have usually dealt with models in which one measure 
of a construct was available, but when several measures with different 
units are obtained the variance of the construct becomes arbitrary. 
If the initial status f actor is assigned a variance of unity (V y ^ = D 

then the assumption \ = \ will identify the variance of the final- 
status factor (Werts & Linn, 1970b) given that b 1 and b 2 are 

identified. Werts, Joreskdg, and Linn (1972) show that if we have, 
e.g., -two/congeneric measures of Y x ( x u and . X i2 } which are 
. repeated at a later time (X^ and X 22 respectively) , then it is 
possible to test whether 'the assumption b 1 = b 2 is compatible with 

b" = b In other words the ratio V y : V y identified by fre 

3 4 x 2 1 

assumption that b 1 -• 1> 2 may be different from the ratio of these 
variances given by the assumption that b 3 = b^ and this will show 
up as a significanfi^crease-in lack of fit of the model to the data 
when the added assumption b 3 = b 4 is imposed on the model. This 
test indicates whether it is reasonable to believe that both measures 
have equal units over time. 

r. 'ntrher Constructs in Statistical Procedures 



In this section we propose to demonstrate that statistical 
procedures frequently imply constructs which many researchers are not 
aware of';'' For illustrative purposes consider a quasi-experimental 
(Campbell & S t an? ey , 1963) study in whi ch four different procedures 
for teaching fifth grade mathematics are randomly assigned to four 
available schools in a district. The mathematics, achievement , of each 
student is measured at the beginning and end. of fifth grade using 
parallel forms of a test which provide" good coverage of the material 
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taught in the various schools (i.e., the test has face validity) . 
frequently happens in naturalistic studies it is found that the mean 
achievement scores at the beginning of the fifth gradj differ. To 
avoid interpretive complications assume perfect validity. Suppose 
that the mean results for schools are as shown in Fig. 5, i.e., cne 
ordering of the schools remained constant over time but the spread of 
means increased in proportion to the initial mean. One Possible 
statistical procedure which the data seem to fit is the analysis of 
variance of repeated measures (Winer, 1962, Chap. 7) which basically - 
consists of subtracting the initial means from the f in *l *"ns a nd , 
testing to see if these differences are the same from, school to school.. 
Since these differences range from 20 units to 5 units for school #1 
and #4 respectively, it is clear that this procedure would conclude 
that there is a treatment (i.e. , school) effect, i.e., school #1 is 
the most and #4 IhTle^ effective A second statist ca Procedure 
which the data fit is the analysis of covariance with, initial status 
controlled (Winer, 1962, Chap. 11). Since the final means are 
perfectly correlated with the initial means it may be shown that 
this procedure will indicate no treatment (i.e. , school) effect, . 
given the standard analysis of covariance assumptions (Werts & Linn, 
1972). In order to understand these 'seemingly contradictory 
interpretations, we need to ponder the following hypothetical question, 
For any given school, what would the final mean be if no treatment 
had been applied? The analysis of variance in essence as sumes that 
for each, school, if no treatment had been given, then the final mean 
would be the sameTs the initial mean. In contrast the afcilysi of 
covariance assumes that if no treatment were given then the final 
-mean would be completely predictable from the initial mean i.e. , in 
our illustration the final means are perfectly correlated with initial 
means. There is no law of nature that either case is necessarily ,o, 
which means that neither statistical procedure may be a PP r °P r ^' 
Furthermore, our analysis has assumed the appropriateness of a linear 
addition model, Which may not provide a reasonable simulation of the 
reality being investigated. 

A slight variation in the above problem occurs when some measure 
is being obtained in a time series and at some point a new treatment 
is imposed. Such a , case might be in the math achievement of students 
who are being followed from grade s chool into ^high school: _ 
Thistlethwaite and Campbell .(I960) ha*e argued that if the Pf* 
treatment trend continues on the pr^treatment trend then no treatment 
effect may be inferred. In real life , however , students who^go to a 
superior nigh school- have probably gone to su V^J^'^^^A 
vice versa. If so, then it is quite possible that the effective high 
school would do well if it could continue the. learning Progress its 
students were making before entry. A treatment effect might well be 
evidenced by a straight trend line from grade school through high 
sSooi; Again, the Unobserved construct is: What would the group 
mean be if there were nO treatment? Without this information no 
statement about treatment effects are warranted n °r can anybody 
validly assert that a particular statistical analysis is appropriate, 
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except within the context of a particular model with fts associated 
assumptions • 



8 t Hypotheses About Changes in Means 

The discussion to this point has been devoted to the analysis of 
the observed variance-covariance matrix. In some problems, however, 
hypotheses really concern structures (i.e., restrictions) on the means 
of variables, e.g., if we gave a class some special assistance in 
^vocabulary we. would like to observe an increase in the average 

vocabulary score of the group, i.e. t the correlation between initial 
and final vocabulary scores would not be the relevant statistic to 
analyze. In such cases the neglect of means (common among path 
analysts) would lead to uninterp ret able results. 

Educational researchers interested in growth have encountered 
the problem of means because of the way that tests are constructed 
(e.g., Carver, 1970) . The procedures used in test development 
typically strive to maximize the discrimination between individuals, 
e.g., items that are answered |o\:rectly by almost everyone at the end 
of a course tend to be omitted since these serve to show similarities 
among individuals. Yet it may be precisely these items that show the 
general progress of the class during the course. The item analysis 
procedures tVius, prevent measurement of true change in means over time. 
Consider the extreme case in which the students have no familiarity 
with the subject matter being taught, which would mean that an 
initial test of their knowledge in this subject would yield a zero 
score for. the whole class. \ ^ parallel test given at the end of the 
course would show varying degrees of . knowledge attained, i.e., a 
positive mean and variance. jThe initial test scores would be 
expected to have a zero (meaningless) correlation with the final 
scores and the final mean would represent the average level of course 
effectiveness. If initially students had little or.no familiarity 
wffch the subject matter then the reliability of the initial test 
might be quite low and yet' this measure might be appropriate for 
measuring changes in student knowledge during the course. Obviously 
path coefficients would be irrelevant to the issue. 
'*» ■ ... . ■■ • "• 

As noted above, parallel tests are assumed to have the same 
underlying mean. Thus, underlying the various observed test means, 
there is assumed to be a common true score mean. If the means do not 
differ -significantly, then the best estimate of the true mean is the 
grand mean of the observed tests. Notice that if the grand mean is 
used as the best estimate of the common test mean, then this will 
affect our estimates of variances and covariances since these are 
' measures of deviation from the grand mean? This mutual interdependence 
is recognized in Jbreskog's, (1970) general model, which allows for 
simultaneous estimation and hypothesis testing given restrictions on 
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both means and the variance- covariance matrix. We may, for example, 
wish to test the hypothesis that the true score means over time 
increase linearly (or exponentially). 

9. General Considerations 

It is relatively easy to find a linear structural model which 
fits the data quite closely, e.g., factor analysts may keep adding 
factors until a good fit is obtained. With a modicum of thought it 
is also relatively easy to obtain a model which , is consistent with 
our theory, when this model is just identified (i.e. .there is a 
^unique solution for each parameter), because the matrix estimated 
from the model (2) will in general equal the observed matrix W . 
Given overidentification, it is possible that the model may be 
rejected because of poor fit to the data. In such cases it is 
usually possible to find a less restrictive model which will fit the 
data better, but this model may not be substantively plausible. It 
is extremely difficult to demonstrate that (a) a moder approximately 
simulates reality, (b) it provides better simulation than another ^ 
model, (c) the constructs defined by the model have greater^explana- 
7o7? power than the observed variables from which they are derived, 
and (d) these constructs are in any sense useful in promoting better 
research. In most cases it seems reasonable to suppose that several 
plausible models may be found, all of which are consistent with the 
observed data. It would then be necessary to deduce what data would 
'need to be collected to discriminate among these models. 



Some of the concepts discussed in previous sections suggest some 
cautions in interpreting observed variance-covariance matrices. Grant- 
ing the validity of using correlations at all (see Tukey, 1954, for a 
discussion of this question), it should be clear from the section on 
the multitrait-multimethod procedure that the probable exi stence of 
errors of measurement and multiple indicators of urn Wjables 
will necessarily make any interpretation a chancy affair. Furthermore,, 
even if the ^attenuated correlations- among the relevant constructs 
were known, correlations are by no means self-interpreting in a causal 
sense (Blaiock, 1964). Thus an observed correlation may be completely 
^ spurious due to the presence of a common antecedent variable (which 
must be controlled). While most psychologists- use the concept of 
^purlbusness! the notion, of controlling a variable in a chain^of causes 
to see if this variable explains the observed association (Blalock, 
1964) is almost unknown at^resen t. It should not be inferred, however % 
. that a causal analysis of the correlations is appropriate, to every 
problem (Bailey, 1970). ' 

Most applications of factor analysis, path analysis, and test 
theory can probably be described as exploratory or speculative in the 
sense that the analysis was performed because the researcher was . 
familiar with that technique rather, than because it could be 
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demonstrated that his approach provided a better simulation of the 
process under study . We are thus in. the unenviable position of dis- 
cussing statistical techniques without knowing when they should be 
used. The value of these techniques has yet to be demonstrated in 
most of the social sciences with the possible exception of economics. 
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IV . A_ Synthesis of Psychometric Literature: , A Multitrait -multimethod 
Modol for 'Studying Growth 

Werts and Linn (1970a) have suggested that a multitrait- 
multimethod approach (Campbell & Fiske, 1959) might be used for study- 
ing growth. The purpose of this paper is to detail such a model and 
to outline implications for the study of growth. The major focus of 
our exposition will be the. logic of this .model rather than the estima- 
tion of parameters or testing ihe fit of the model to data. A compre- 
hensive discussion of appropriate estimation and' fit-testing procedures 
may be found in JHreskog (1970a) , whose general model for the analysis 
of covariance structures subsumes the models used in this paper. 

7 The Model 

ff \ ' 

The multitrait-multimethod approach may be treated as a problem 
in confirmatory factor analysis (JHreskog, 1970a, 1971). For illus- 
trative purposes we will consider the example of three traits and^ 
three methods since this is the minimum number of traits and methods 
required to produce unique (defined in JHreskog, 1969, pp. 185-186) 
parameter estimates, given the assumption that each observed measure 
loads on only one trait and one method factor and all factors are 
oblique. The general factor analytic model is: 

! y « y + AT + e - (!) 

where y is the vector of observed scores, 

y is the mean vector of y , 

A is a matrix of factor loadings, 

T is a vector of common factor scores, and 

e is a vector of unique factor scores- corresponding to 
* specific factors and/or. errors of measurement, 

r— v 

For our example: . 

y 1 = (y n ^21^31 ^12^22 ^32^13 <*23 ' y 33> » (la) 
where in y^ , i = method and j = trait, 

T f = (T 1 ,T 2 ,T 3 ,M 1 ,M 2 ,M 3 ) , "(lb) 
where T^ = the j -th trait factor, 

M i = the i -th method factor, ! ..- r 
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(lc) 



where are loadings on trait factors and 

are loadings on method factors. 
The expected variance-co variance matrix Z of y is then given by 

z = A*A\ + e 2 



(2) 



where e 2 is a diagonal matrix whose elements are the variances of 
e . Since -all factors are oblique, in our example: 
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(2a) 



where the C f s are covariances and the V f s are variances* 

Following Jbreskog (1970a) , parameters will be labelled, as one of 
three kinds: (1) fixed parameters that have been assigned given values; 
(2) constrained parameters that are unknown but equal to one or more 



other parameters; and (3) free parameters that are unknown and not 
constrained to be equal to any other parameter' The term "identifiable" 
will be used -in the sense defined by Fisher (1966, p. 25): "we shall 
speak- of that equation as identifiable (or identified) if there exists 
some combination of prior and posterior information which will enable us 
to distinguish its parameters from those of any other equation in the 
same form." For the models studied in this paper, the term "identifiable" 
is synonymous with the factor analyst f s term "unique solution," i.e., a 
solution is "unique" 'if all linear transformations of the factors that 
leave the fixed parameters unchanged also leave the free parameters 
unchanged. As JBreskog (1970b) notes: "Before an attempt is^m ade t0 
estimate' a^model of thiskind, the identification problem must^be 
examined;" The number of overidentif ying restrictions on ttie^nodel is 
frequently of interest, for example, after standardizing factor variances 
(i.e., V'' * V M ='l) the three method by three trait model has three 
j i 

overidentifyivig restrictions, i e e. , E~ has 45 distinct variances and 
covariances as compared to 42 free parameters to be estimated (18 factor 
loadings, 15 factor covariances in and nine residual variances in 

6 ). The number of overidentifying restrictions are the degrees of 
freedom (df) for the test statistic in JBreskog f s general model (1970a, 
p. 241, sec. 1.4). The "path analysis" approach used by Werts and Linn 
(1970a) can be very useful in exploring the identification question in 
overidentified models. However, as noted by^Hauser and Goldberger (1970) 
the "path analysis" literature does not adequately deal with the estima- 
tion problem in overidentified models, in part because the, sample- 
population distinction is blurred.' 

The multitrait-multimethod approach considered above does not 
consider any functional relationships among the trait factors, i.e., 
the approach deals only with errors of measurement. In the study of 
growth, these trait factors correspond to initial status, final status, 
and the determinants of growth and a structural model showing the rela- 
tionship among these variables must be specified. Substantive inferences 
about growth are based on estimates of the parameters of the structural 
model. 

Suppose that the structural model for growth took the form: 

T 3 DjTj + D 2 T2 +5 ' (3) • 

where T 3 is the final status, T 2 is the initial status, and is a 

determinant of growth; all other influences on growth (represented by 
£.) being independent of ^ and T 2 . In this model the initial status 
T 9 may influence the rate of growth. The parameters of equation (3) dre 
just identifiable in terms of the elements of $ , i.e., the number of 
restrictions on .the overall, model is not changed. Assuming that T 3 and 
T 9 are measurements on the same dimension as implied by the terms 
"initial" and "final" status , growth (A) is equal to T 3 - T 2 . Werts 
. and Linn (1970b) have shown that the regression weights for and T 2 

are: 



J 



and 



AT 1* T 2 



(4) 
(5) 



where D 



AT .T is the regression weight of A on T .with 



l l"2 

controlled and D^ T T is the regression weight of A on T 2 with r , 

T 1 controlled. In other words D 1 represents the direct influence 
of T 1 on growth and D 2 represents the direct influence of initial 
status on growth plus unity (which represents that part of T 3 . whidi 

is initial status). Since T, = A + T- , substituting equations (4) 
and (5) into (3) yields: 
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respectively. If the analyst wished to scale a factor by the unit of a 
particular measure this may be accomplished by setting the A^. slope 
for the measure equal to unity (in which case the variance of the cor- 
responding factor should not be standardized but left free to be 
estimated by the program). The assumption that T 2 and are 
measures on the same dimension is equivalent to setting the same method 
regression weights equal, i.e., in our example = ^3 » ^22 = ^23 

As detailed by Werts and Linn (1970a) the effect of 



and A^2 = A, 



*33 



these restrictions is that the ratio of the variance of T 3 to T 2 is 
fixed. For estimation purposes i;t is convenient to standardize all 



The 



factors except . T^ whose variance is fixed in relation to m 
model defined by equations (7a), (7b), and (7c) is no longer a simple 
factor analysis model, but may be estimated using jBreskog-s (1970a) 
general model for the analysis of covariance structures. For this pur- 
pose. A* may be rewritten as the product of two matrices: 



A* = BA** 



where 
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and x 13 = B 13 /A 13 > X 



23 



B 23 /A 23 , ; x 33 



B 33 /A 33 



By substitution:. 



Z = BA**«*A**'.B' + 9 , 

which is a special case of Joreskog's (1970a) general model. 

In using the ^computer program (jBreskog, Gruvaeus, & van Thillo, 
1970) the parameters A 12 , A 22 » **32 in A ** should be constrained 
to be equal to A 13 , A 23 » and A 33 respectively in B . The result- 
ing model has 45 distinct variances'' and covariances in E and 40 free 
and constrained parameters (17 in .A** .14 in $* , 9 in 6 , none in 
B because of equality restraints), which means that the model has five 
overidentifying restrictions (df). The advantage of casting the analy- 
sis in .terms of jBreskog's general model is that, given the assumption 
that the observed variables are distributed normally > various hypotheses 
about the model may be tested in large samples. In particular, we 
may wonder if trait factors are uncorrelated with methods factors and 
methods factors with each other as assumed by Cronbach <and Furby (1970) 
and Werts ajid Linn (1970a) in their analysis o f growth. To make this 
test, the analysis would be run with the model o'f (la) , (lb), and (lc) , 

= 1 and then the anal- 

3 

. . : . . .. ; . ■ , ., ± n 3 ' ^ 

,Q s Q = C 

T 3 M 1 T 3 M 2 T 3 M 1 
initial analysis would yield a chi-square with three df for testing the 
fit of the model to the data. The second analysis. would yield a chi- 
square with 15 df since 12 additional restrictions have been made. The 
increase in chi-square with 12 df is a test of the tenability of the 
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additional restrictions. Starting with the same initial mode 1 ,_the 
tenability of assuming that A = A^ , A = 23 , and Ag - 33 
may be tested (dropping the V = 1 assumption) using the increase 
in chi-sauare with 2 df. Likewise starting with these assumptions 
d.Z equations (7a) , (7b), and (7c), and df = 5) hypotheses about . 

growth can be tested, e.g., D, can be set ^V^V ^-ctlv 
, 2 /,r - i\ -fo a test of whether T 0 directly 

(see equation (4)), the increase in X 2 = « testing this hypoth- 

ec The fit of the observed variance-covariance matrix S to tne 
Stated e ements of I may be used- to form some judgment as to changes 
in fit resulting from additional restrictions, especially when the * 
test is inappropriate because the assumption of multivariate normality 



is not reasonable. 



^r^llv conceived by Campbell and Fiske (1959) the multitrait- 
As originally conceived D y ^ v measured with ea-ch method, 

method, in order to fi* the ratio •« tta v«l pleasures 
to the initial status factor, only one pair of initial ana 
„ith the same units of measurement are required, i.e., the three sets or 
Initial-final measures , » ^J^^J^^^^^^ 
" heU ««S5 1 SS were replaced with di fferent method ..a- 
suxes, even though the resulting -"ri^vould n< .^nge t^ *^">™ f . 
quired by Campbel V^^S^^e^^™**"' 

pearT funXmen ^^^kZZ^^ 

postulated, and does not allow for nonsymmetrical method-by-trait 
combinations. 

• Relationship to Classical Test Theory 
The multitrait-multimethod 

rrefoir/t^can ^^^SfefSSfSffi ^ " ■ 

- . 7 „ \ First let us consider the analysis g^-vcn 

and two finally (y 13 , Y23 } • rirst 



the traditional assumptions that all errors of measurement are independent 
of each other and of the true scores. In our formulation this is equiva- 
lent to asserting that there are no methods factors. Without further 
assumptions the model may be represented in terms of equation (1) as 



1 = (yi2' y 22' y 13' y 23 ) 
T = (T 2 ,T 3 ) 
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Assuming that initial and final status are on the same scale, "parallel, 
test assumptions are equivalent to (Joreskog, 1971) fixing A l2 = 
A„„ » 1 and constraining -y = V a and Va », Ve 



A 22 " A 13 



*23 



Ve 12 " ¥e 22 - Ve 13 "" Ve 23 



All parameters are identifiable and df = 5 . Identification still occurs 
without the error variance assumptions; v (df = 3) , i.e., in true score 
lexicon, "essentially tau-equivalent" measures (Lord ,& Noyick, (( 1968, ^ 
pp 47-50) would' suffice. If we choose to use nonparallel <or congeneric 
(joreskog, 1971) measures, one pair of measures over time. being on the 
same scale (e.g.,. A^ » A^) , Y T could be arbitrarily standardized 

(= 1) yielding an identifiable model with df = 1 . ; In all these cases; 
growth statistics may be obtained from the parameter estimates or the 
model can be transformed to obtain growth statistics directly. Inserting 

T 3 " T 2 + A then: 
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,.' where A 12 = A 13 by assumption, and 
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(9b) 



(9c) 



where V T = 1. for convenience. 

2 

Relevant growth statistics are: 



P T A = correlation of initial status with gain - 
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+ V A + 2C^ A , and 
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Similarly if parameter estimates were derived from the original model 
of equations (8a), (8b),. (8c), (8d) , and (8e) , growth statistics can 
be obtained by 
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Following jBreskog (1971) the parallel test assumption can be tested 
(given multivariate normality) by comparing the chi-square for the 
"essentially tau-equivalent" model to that for the "parallel" test 
model; the difference in chi-square with df = 2 is a test,of_ assump- 
tions that V V and V 0 ° V • . Similarly the 'increase 

e 12 e 22 13 23 

in chi-square from the "congeneric" model to the "essentially tau- 
equivalent" model (df =2) is a test of the assumptions that A^ = 
A and A 13 - A 23 . If the parallel test assumptions are accepted 
then the pogulation reliability at the initial time may be estimated 
by V T * (V T + V ) and reliability at the final time by 

T 2 2 e 12* 3 

(V T + V 0 ) . The reliability for each test is the square of the 
3 13 

corresponding standardized factor loading in the case of "essentially 
tau-equivalent" or "congeneric" measures.. Another statistic of inter- 
est in the traditional psychometric literature is the reliability of 
differences . (p A ) which is defined as the true variance of the dif- 
ferences divided by the variance of the observed differences. In the 
parallel case the estimated population error variances can be used to 
obtain directly: 



V 



5. 'A • (12a) 

Va + V + V, 



e 12 e 13 

With "essentially tau-equivalent" assumptions no statement is made 
about equality of error variances so that four reliabilities may be 
estimated; „ 

V A 

tl - r- : — . d2b) 
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P A = TTfrl S . (12e) 
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Formulas (12a), (12b), (12c), U2d) , and (12e) are based on the 
assumption that the true scores have the same units as the observed 
scores, which is not true in the case of congeneric measures. Since 
the regression of observed on true differences is equal to the regres- 
sion of observed on true scores (Werts & Linn, 1970a, equation (25)) 
it is only necessary to standardize this weight with the appropriate 
variances to obtain the. reliability of differences for all cases, e.g., 
in the congeneric case if A^ 2 = A^ 3 then 



P 



-LVt t — T7 — (12£) 



A " "12 



/ V - 2C(y 12 , y 13 ) 

y 12 y 13. 



where. V • V and C(y 19 , y-, n) - are the estimated elements in 

yi2 y i3 iZ - - J \ . . ; m 

t # This formula uses estimated elements in £ which are provided 
in'the computer outputifor Jbreskog f s program (jbreskog, Gruvaeus, & 
van Thillo, 1970). The program computes the elements in Z from^the 
estimates for the underlying parameters, e.g., C(y 12 , 3^3) f A 12 A 13 C T 2 T 3 

This model (all measurement errors independent) may be used to clarify 
traditional procedures for obtafning growth statistics. For example, 
consider the case in which one initial and one final test is given. A 
common procedure "is to obtain split half reliabilities at -each time and 
use these to correct for ^attenuation. If y 12 ami Yo.2 are the in *~ 
tial split halves and y 13 and y 23 the final split halves, this case 
corresponds exactly to the parallel measure case analyzed above. The 
difference from the traditional procedure is that the complete variance- 
covariance matrix for the split halves is computed and used in the analy- 
sis. As shown above, the "parallel" and "essentially tau-equivalent" 
assumptions can be tested against the congeneric model and '$ie congeneric 
model is overidentified. From, this perspective the traditional procedure 
neglects useful information about correlations among split halves and 
thereby loses , the possibility of rejecting the model because of poor fit 
to the data.;and .of. analyzing the data making only congeneric test assump- 
tions. To understand the connection with the traditional formula it is 
of interest to standardize I into a correlation matrix (correlations 
generated by the model are indicated by symbol p ) and to show the rela- 
tionships to standardized model parameters (denoted by asterisk): 

; P(y 12 ,y 13 ) = Ai 2 pT 2 T 3 %3 (13a V 
p(y 12 »y 2 3 ) = %2Pt 2 t 3 a $3 (13b) 

• ^Zl^^Mz^ifi^'. ' (13c) 

; « 2 3r 



P (y 12 ,y 22 ) = a* 2 a* 2 U3e) 

- P^IS' 5 ^ = A 13*23 * ' (13f) 

If parallel test assumptions are valid then A£ 2 = ^.Z and A *3 = A 23 * 
in which case equations (13a), (13b), (13c), and (13d) are identical and 
should be recognized as the traditional correction for attenuation, 
except that, the correlations are drawn from 2 rather than from the 
observed correlation matrix S . . Equations (13e) and (13f), under 
parallel test assumptions, are simply the assumption that the reliability 
defined as the squared correlation (i.e., A* 2 v' or A* 3 ) of the observed 
with the true score is equal to the correlation, between two parallel 
tests," but again the correlations are drawn from .£ not from S . What 
these equations show is that' it is not necessary for the reliabilities of 
the split halves to be equal in order to identify the unattenuated cor- 
relation cL _' : given uncorrelated errors. If the estimates of the 

T 2 X 3 ■& 
elements in Z for the parallel case are examined it will be^found that 
because of the structural specifications: V y - = "y 22 » ^y 13 ~ y 23 . .* 

C(y 12 , y 13 ) = C(y 13 , y 23 ) = C(y 22 , y 13 ) = C(y 22 , y 23 ) , C<y 12 , y 22 ) = , 
6(y 13 , y 23 ) = V T and, C(y 12 , y^) - C(y 22 , y^) - 6^ • Translating 
the equation for the reliability of differences into the elements of Z : 

; . * e(yi3> * 23) 1 2 ^ (y i2' :y i3 ) " (14a) ; 

j .. . p A " v + <j . - 2C(y 12 , y 13 ) 

yi2 y i3 12 13 



or 



v y i2 P(y 12 , y 22 ) * v yi3 P<yi3» ^ - 2pA(y i2» ?i3 y $yjy 13 




(14b) 



Equation (14b) should be recognized as the traditional formula for the 
reliability of differences, noting however that .the estimates are drawn 
from t , not from the observed matrix S . The essentially tau- 
equivalent case differs from the parallel case in that the corresponding 
variances in £ are not required to be equal, however the covariances 
between' independent measures of different traits are still equal to the 
covariances between the corresponding traits .factors. This means that 
formula (14a) could be used for any pair of tau-equivalent tests over 
time. For congeneric measures the formula involves the pairs of measures 
which -have the same units over time, e.g., if A 12 = A 13 then equation 
, (12f) may be translated into 
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'13: 



. (14 c) 



Equation (14c) is the reliability of differences formula given by Werts 
and Linn (1970a, equation (26)) for the case of correlated errors over 
time for the pair of measurements on the same scale, i.e., the Werts 
and Linn formula is also appropriate to the independent error case when 
applied to the elements of £ rather than S • If formula (14c) applies 
to 'correlated errors using congeneric measures then it may be specialized 
for the parallel measures case, e.g. , if y 12 and y 13 have noninde- 
pendent errors and y 12 and y 23 have independent errors: 

(a) A*« = A*o , by parallel test assumptions, therefore A* 2 A* 3 p t T = 

A A 

A !2 A $3PT 2 T 3 » 

(b) but P(y 12 , y 23 ) = A !2 A 23 pA T 2 T 3 
Since 



a* 2 = ]p(y 12 , y 22 ) , % 3 =/p(y 13 , y 23 ) . \ 2 = v y 22 ' h^' V 5 

equation (14 c) becomes 

V y p(y 12 , y 22 ) + V P(y 13 , y 23 ) - 2p(y 12 , y 23 ) J\ 2 V 13 



23 



v yi2 + % u - 2 p(yi2» yij JhAi - 



. (14.d) 



Equation (I4d) is the formula for the reliability of differences for 
"linked" (i.e. , correlated errors) parallel test measures given by 
Cronbach and Furby (1970, equation (6) ), which can be seen to be the 
parallel measure specialization of" the Werts-Linn equation for noninde- 
pendent congeneric measures. Similarly from equations (11a), (lib) j 
(11c), (lid), and (lie) it follows that the estimated correlation of 



^status with gain is: 



C T 3 T 2 - % 



(15a) 



(V + V - 2C • ) 
T 2 T 2 3 2 X 3 



J 



Iti the congeneric 'case with A 12 = A 13 , this may be transformed into 
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p 



v 



VrV'^ g y i2T _-, (15b) 



/a* 2 V + l*\v - 2p_ U% f{ 
» 12 Vl2 ., 13 y 13 2 T 3 12 - J V 



2 T 3 12 - 3 v y-i2 yia 



Formula (15b} is the correlation of. status with gain given by Werts and 
Linn (1970a, "equation (28)) for the case of congeneric measures and cor- 
related errors, i.e., the formula applies also to the independent error 
case. In the case of parallel independent measures P- ^ ' ■ p(y^2» y 13^ 

* Jp(y 12 , y22 ) P ( yi3» ^23^ which when substituted into rormula (15b) 
yields the traditional formula for the correlation of status with gain 
as applied to the elements of 2 . 



v >/p(yi2»y22 ) yp(yi2»y22>^ 12 + p(yi3»y23>^ 13 - z ^ y i2>n$>J\ 2 \ 2 . 

Our purpose in demonstrating relationships to traditional formulations* is 
purely" heuristic, since jHreskog's program yields estimates of model param- 
eters given the structural assumptions specified by A the investigator, i.e., 
the traditional formulas apply to the elements of I which are not 
directly observable but which are estimated as a function of the parameter 
estimates. Traditional psychometric approaches have dealt with models . 
which are just identified yhich means that models which exactly reproduce 
the observed variance- co variance matrr.r, can be employ ed^i.e. , S = I ). 
The limitation in this approach is that overidentification^is necessary if 
• the fit of the model to the data is to be tested. 

In this paragraph we propose to use our model to specify the condi— 
. tions implicit in Cronbach's (1960/ pp. 136-139) discussion of coefficients 
of "stability 11 and "equivalence." C^onbach uses an example in which two ; 
forms of the Mechanical Reasoning Test of the DAT were used, the same forms 
being used for test and retest purposes. When the same form is repeated, 
the test-retest correlation is higher than the test-retest correlation 
between different forms, suggesting the presence of "lorig-lasting test- 
specif ic fi factors. The implication is that the errors of measurement for 
the same test repeated are, not independent. Assuming that both forms 
were repeated and errors of measurement independent for different forms, 
the-model for parallel measures-*^ pf the form: 

y y + AT , (16a) 

where ~ ■ " 

y - (y 12 , y 22 » ^n* (16b) 



(15c) 



where y^ 2 and are the same test as are y 



22 and y 23 



and 
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T 3 , 
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e 23> 
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(16d) 
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(16e) 



The model o£ (16a) is..- the special case of factor analysis in which 
the residual factors are treated as latent factors. Examination of * 
shows that the same test errors of measurement are nonindependent , .i.e., 



e e 
12 22 



and C o . _ 4 0 . All parameters are identifiable and df = 3 
22 e 23 



(10 distinct elements in I less 7 free and constrained parameters). 
Essentially tau-equivalent assumptions would still have provided identifi 
cation but with only one overidentifying restriction. (since V ^ V 

e 12 e 22 

^0 £ ) • An. interesting case occurs with congeneric assumptions in 
13 23 

which case the model is underidentif ied; however, the unattenuated trait 
correlation p T T ' is just identified [p2 ^ = C(y 12 , y 23 )C(y 22 , y 13 ) * 

'''■"*.■ 23 2 *3 

C\yi2» ^22^^13' y 23^ ■ identification may be achieved with the 
congeneric model by repeating only one test (assuming A^ 2 - ) an <* 

using different method measures for y 22 and y 2 ~ in which case the 
model is: "■ ^ 
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(16f) 



(16g) 



e 23 

This model is just identified (10 distinct elements in Z i e8 s 10 
parameters to be estimated). Let us return to Cronbach's example in 
which there are Forms A (y 12 ) and B (y 22 ) initially and retests on 
Forms A' (y^) and B (y 23 ) three years later. Cronbach partitions 
the^variance using the immediate and retest correlations among forms 
(assumed parallel) which in our model corresponds to the elements of 
Z . We may translate -Cronbach's partitioning procedure into functions 
of the model parameters in equations (16a), (16b), (16c), (16d), and 
(16e) as follows: 

1. "Lasting General Variance" =p(y 12 , y 23 ) = A* 2 p(T 2 , T 3 )A* 3 which 
according to the model equals p(y 22 , y" 13 ) = A* 2 p(T 2 , T 3 )A* 3 . - 

2. "Temporary General Variance"- p(y 12 , y 22 ) - p(y 12 , y 23 ) = A*,,A* 2 - 
A 12 P ^ T 2' T 2^ A 23 which according to the model equals p(y 12> y 22 ) - 
P(y 2 2» 7l3> = A *2 A 22 " A 22 p(T 2» T 3 )A 13 In Principle there is a 
different "Temporary General Variance" .for the end time p(y 17 , y 23 ) 

p(y 12 , y 23 > = A ?3 A 23 " A *2 p(T 2» T 3 )A 23 which e i uals p(y 13 , y 23 ) - 
P(y 22 , y 13 ) - a* 3 a* 3 - a* 2 p(t 2 , t 3 )a* 3 . 

3. "Lasting Specific Variance" for Form A P (y 12 , y 13 ) - P (y 12 , y 23 ) = 
P(y 12 , .y 13 ) - p(y 22 , y 13 ) = Jl - (A*) 2 - P (e 12 , e i3 ) JiZ 



(A* 3 y 



and 



for Form B p(y 22 , y 23 ) - P(y 12 ' y 23^ = p ^ y 22' y 23^ " p ^ y 22' y 13^ 

Jl " ( A 2 2 )2 p(e 22« W 1 " (A 23 )2 * 
4, "Temporary Specific Variance" [1 - P(y-j_ 2 » y 2 2^^ ^ p ^ y 12' y 13^ " 

p(y 12 , y 23 )l = [i - P-(y 12 i yii>} - tp(y 12 . y^) * p(y 22 » y i3 )] = 



1 - A* 2 A$ 2 -Jl - (AJ 2 ) 2 P(e 12 , e 13 )/l -_(A*^) 2 for the correlations 
used by Cronbach, but in principle there are three other temporary 



specific variances 1 - A* 2 A* 2 - Jl - (A* 2 ) 2 p(e 22 , e 23 ) Jl - (A* 3 ) 
1 - A* 3 A| 3 - Jl - (A* 2 ) 2 p(e 12 , e 13 )/l 



(A*,)' 



and 1 - A *3 A 23~ 



) p(e 22 , e 23 ) 



jl - (A 



23 ; 



It can be seen that Cronbach' s procedure for partitioning of variance 
involves complicated functions of ~the model parameters. Not only is it 
simpler to analyze observed correlations in terms of a set of structural 
parameters, but it allows for analysis of overidentified models. Further 
light can be shed on the assumptions implicit in the model of (16a), (16b),, 
(16c), (16d), and (16e) by asking what variables account for the "correlated 
errors. Assuming that a single factor (1^) underlies the correlation for 



Form A and another factor (M 2 ) for Form B the, mo del becomes : 



where 



U + AT + e ' 



y " (y 12' y 22' y 13' y 23 } » 
T = (T 2 , T 3 , M lf M 2 ) 

e ' = * e 12' e 22' e 13' e 23^ * 



(17a) 
(17b) 
(17c) 
(17 d) 



A = 



1 0 B 
10 0 
0* 1 B 

0 1 : 0 



12 



13 



22 



B 23J 



«17e) 



and 
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Analysis of the identification problem shows that B 12 , B 22 , 



(17f) 



B 13 , and 



B 23 are. not separately identifiable; only the products (B i2 B^) and 
are identified. This means (thkt in Jbreskog's program we 



( B 22 B 23^ 

may arbitrarily set B 12 = B 13 and 



B 22 \* B 23 without disturbing the 
B 12 = B 13 



B 22 = B 23 



estimation for other parameters. Assuming B 12 = B 13 and 
this model is a simple transformation, of ?(16a) , v(16b) , (16c), (16d) , 
and (16e) under essentially tau-equiv'alent assumptions, that is, 
v ^ y V 4 V in equation- (16e) . In particular it can 

e 12 e 22 ' e 13 e 23 
be seen that it must be assumed that M 1 and M 2 are uncorrected. 
It is possible to deal with oblique true and method factors but usually 
more different method measures are required as in our 3 trait x 3 
method example in Section I. ^ 

' When methods of measuring a trait are made as different as possible, 
it is usually the case that the units of measurement are different, 
which means that . congeneric rather than essentially tau-equiyalent or 
parallel assumptions are appropriate. Wert s and Linn (1970a) consider 
growth-models based on congeneric measures, e.g., in one; case they use 
three congeneric' measures of T 2 and two congeneric measures of T 3 , 
allowing for same test correlated errors over time. This model is . 
overidentif ied, but no attempt was made to deal with this complication. 
Phrasing this problem in terms of Jbreskog's general model: 

y = y + AT + e 

I ;. y ■' (yi2» y 22« y 32« y i3« 



where y 12 and y 13 are linked as are y 22 and 

T = (T 2 , T 3 , M 1 , M 2 ) 



y 23 ) 
y 23..' 



(18a) 
(18b) 
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(18c) 



(18d) 
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(18e) 



Assuming that 

B,„ , Bno = B 



= A„ 



1 

o : 

and for convenience that 



B 12 = 



A 12 83 A 13 9 k 22 "23 
13 * "22 ~ "23 9 this model has four overidentifying restrictions (15 
distinct elements in Z less 11 parameters to be estimated). Werts 
and Linn give two formulas (1970a, p. 198, equations (28) and (29)) for 
estimating the correlation of .status with gain involving observed cor- 
relations and variances whereas Jbreskog's approach generates a single 

In essence Werts and Linn dealt with the 

S which may. yield 



estimate by equation (15a), 

elements of the observed variance-co variance matrix 

whereas such inconsistency cannot arise 



inconsistent estimates of p 



T 2 A 



with respect to the elements in E , Jbreskog has an unpublished operat- 
ing program for estimating factor scores within the confirmatory factor 
analysis model (Jbreskog, 1971). As Cronbach and Furby (1970) note, 
however, there is seldom need for such estimates. 



Relationship to Factor Analysis 



A common practice in the factor analysis of growth data is to com- 
pare standardized factor loadings at one time to the loadings for the 
same set of measures at a later time. If the. pattern of loadings remains 
constant over time the inference is drawn that the factors are measuring 
essentially the same dimension at different times. For example we might 
have three. measures of T 2 at time 1 with factor loadings A£j « .30 , 

A 22 = * 40 » and A $2 = * 50 and identical loadin g s on T 3 when t;h ? se 
measures are repeated, at time 2, i.e., A* 3 » .30 t A* = .40 , and 
A 33 = • 50 • For heuristic purposes let us suppose that the repetition 
of tests did not result in methods factors|gnd that the true variance 
increased from V™ = 1.0 to V T = 1.5 over time and Cm m = 1.2 . 

2 3 2 3 

It may be immediately inferred that the error variances for all tests 

increased over time since the test reliabilities (in this model the 
squared factor loadings) remained constant and the true variance increased. 
However, Wiley and Wiley (1970) have persuasively argued that it is more 
likely that error variances are a test characteristic which is likely to 
remain constant over time. If this is so, then an increase in true vari- 
ance along the same dimension will necessarily mean that the reliabilities 
of the tests will increase over time, i.e., the standardized factor load- 
ings will increase. % In the -same fashion it may be deduced that if for 
aty given test, over time the unstandardized regression weights (A i2 = A i3 ) 
and the error variances (V 0 = \L ) are equal, then in general 

e i2 . e i3 v m . " 

the standardized factor loadings (A*^) are not proportional from one 

time to another. We conclude that comparison of standardized factor 
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loading patterns over time provides no logical base for any conclusions 
about 'whether pretests and post tests are measuring the same" variable. 
It appears to us that such an assumption, which in this model is equiva- 
lent to equality of unstandardized regression weights over time (e.g. , 
A 12 = A 13^ 9 is b88icall y not stable within the framework of this 
model. It would seem better not to make dubious assumptions that either 
the reliability or the error , variance are relatively constant (over time) 
test characteristics, but to build mod^.s and gather requisite informa- 
tion such that these model parameters are identified. 

While it is not possible to test the assumption that = A i3 » 

it is quite possible for this assumption to be incompatible with the 
assumption that = A 23 . The ratio of V T to V,^ resulting from 

A 12 = A 13 may differ from the ratio resulting from A 22 ~ A 23 ■ This 
may be tested by the increase in x 2 (*f = D resulting from the addi- 
tion of A 22 = A 23 to the model in which A 12 - A 13 . Within the frame- 
work of this model, if it is true that the corresponding paii^s of tests 
over time in act have the same units, then the scaling of y T to V T 
should be the same for each pair. . 3 2 

The finding that the data are consistent with the hypothesis that 
A = A-. and. A„ = A„ does not necessarily Imply that the units of 
measurement for tfie corresponding pairs of tests over time are the same 
since it is quite possible for the scaling to be erroneous for both pairs 
of tests but in the same way. If the data are inconsistent with the 
hypothesis that A 12 = A 13 and A 22 = A 23 we could conclude that the 
units over' time are not the same for both sets of tests, but it is still 
possible that the units are the same for one of the-sets over time. Even 
if it could be shown that A- 2 = A 13 , this would only b,e evidence 
consistent with, not p^roof of, the hypothesis that the scales are measur- 
ing th^&qgia^prpefess^ver time. j 

» # » 
Determinants of Growth 

We-ts and Linn (1970b) have considered the problem of making infer- 
ences about the determinants in a linear model. The Werts-Linn formula- 
tion was based on classical true score assumptions, i.e., no provision 
was made for methods factors. For heuristic purposes let us reconsider 
the problem of growth determinants, formulating the. three trait, three 
method model in terms of growth (T 3 = T^ + A) : 
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It should be noted that although this formulation does not directly 
involve the. parameters of the underlying growth model A = „ T, + 

D T 9 + C , however, the regressidn weights are: 

A 2* 1 



'AT r T 2 1 



AT 1- T 2 



C Tl A " C T 2 A C T lT2 



1 - C 



T 1 T 2 



(19d) 



and 



*V T 1 



C T 2 A " C T 1 A C T 1 T 2 



-1 - C 



T 1 T 2 



(19e) 
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Traditional test theorists (e.g., Bloom, 1964; Thorndike, 1966) 
have been very, concerned with and have drawn substantive inferences 
about the determinants of growth from the correlation of status with 
gain, usually corrected for "attenuation." However, as detailed by 
Werts and Linn' (1970b) , in a linear structural model prime interest is 
in the model parameters D and D since if either one is 

1 ,A 2 2 1 

direct^ inff 6nCe Wi J 1 b \ drawn that the corresponding variable does - not 
directly influence gain.-Except in the case in which initial status 
is uncorrected with all determinants of growth, knowledge of the 
correlation of status with gain, p T & , does not allow us to draw 

inferences about model parameters. It is quite possible for n 

/ | T 2 A 

to be completely spurious due to a common antecedent influence or it 

is quite possible for pj, & , to be zero without implying that D 

° r D AT 2 .T 1 be Zer °* this reason we question Th'o'rndike's (1966,. 

p. 124) interpretation: "In considerable part, the factors that produce 
gains during a specified time span appear to be different from those 
that produced the level of competence exhibited at the beginning of the 
period. Our objection is that Thorndike's conclusion was made from the 
correlation of status with gain, without specifically introducing into 
the analysis any presumed determinants of growth. In a linear structural 
model the total association of initial status with growth is an insuffi- 
cient basis for drawing inferences about the various possible determinants 
or growth. 



Discussion 

The variety of test response tendencies covered by the rubric 
"methods factors" appear to be an almost universal complication in 
sociopsychological growth studies. Even though in principle the 
multitrait-multimethod model presented in this paper provides for 

methods factors," it does not follow that this model does in fact pro- 
vide a b;etter simulation of . reality *than previous models which" have 
typically ignored methods factors by assuming independent errors of 
measurement. It may be expected that our procedure will typically yield 
different parameter estimates (e.g., correlation of status with gain) . 
thanfrrevious procedures, but what has been learned about growth and its 
determinants thereby? What is learned' about reality from the overwhelm- 
ing concern of the factor analyst with statistical fit? There is no 
guarantee that the best fitting model yields substantively meaningful 
results (e.g., Werts, JBreskog, & Linn, 1971). Why bother with com- 
plicated structural models involving unmeasured variables when it is 
likely that a simple ^egression equation involving only measured varia- « 
bles-will provide the best prediction of the criterion? From our perspec- 
!'v * ? researcher's^ basic interest is in reality, then the research 
must be designed to explore reality, i.e., to offer evidence as to which 
of the initially plausible alternative hypotheses (models) provides the 
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better simulation. In some cases this may involve a study of the 
theoretical implications to see what information is necessary to 
discriminate between the alternative models. In other cases the 
study may be a continuing one as in the building of models to 
simulate the national economy, in which case the ability to batter 
predict new yearly data is used to discriminate among models .^-Our 
purpose in making these remarks is to heighten the awareness of 
researchers that parameter estimates, such as the reliability of 
gain scores, are always made within the framework of a whole set 
of untested assumptions about the nature of reality. It is mis- 
leading to talk about " the correlation of status with gain 11 since 
the meaning of this parameter is totally a function of the partic- 
ular model used to derive th* parameter. In most cases in .which 
this type of estimate has been used, no effort has-been tn&de to 
examine the validity, or even plausibility of the models' finderlying 
these estimates. The linear structural model presented herein is 
as suspect as any other model and heeds to be justified as one of 
the plausible alternative hypotheses, prior to data analysis. 
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V. Conclusions 



Sections III and IV constitute fulfillment of the project 
objectives as stated in. the original proposal. The entire written 
output of this project has been or is in the process of being dis- 
seminatj^to the various relevant audiences. All material has been 
publisher or been accepted for formal publication in the final form 
given in this report. 

The substantive conclusions of -this project are stated in 
sections III and IV. While we have succeeded in integrating the 
methodological literature within the scope of the project, the 
limitations of our approach need to be stated. The study of the 
methodological literature alone cannot lead to any conclusions 
about which kinds of educational growth problems it would be 
appropriate to apply these methods to. It is. much clearer in the 
physical sciences that quantitative analysis is appropriate only 
when the. mathematical model underlying .the analytical procedure 
approximately simulates the process under study. In the social 
sciences it is typically unclear whether the model underlying the 
statistics being used has any resemblance to the- phenomenon, usually 
because we know very little about how the phenomenon actually works. 
In our judgment, priority should be given to work that attempts to 
match methodology to particular substantive problems. 
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1. Comment on "The estimation of measurement error in panel data, 

J 

2. Comment on Boyle's "Path analysis and ordinal data," 

3. Errata to the Werts-Linn comments on Boyle's "Path analysis 
and ordinal data," 

4. Another perspective on "Linear regression, structural 
relations , and measurement error •" 

5. A congeneric model for platonic true scores, 

6. Estimating true scores using group membership. 

7. Errors of inference due to ^errors of measurement, 

8. Identification and estimation in path analysis with 
unmeasured variables. 

9. Intraclass reliability estimates: Testing structural , 
assumptions. 
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COMMENT* ON "THE ESTIMATION OF 
MEASUREMENT ERROR IN 
PANEL DATA" 

Wiley and Wiley (1970) have made a con- 
tribution to the literature on dealing with errors 
of measurement by showing how to build a 

' model employing the assumption of homogeneity 
of error variance in panel data. They argue that 
this assumption is more plausible than the as- 

« sumption that the reliability remains constant 
over time (Heise, 1969). Since we have avail- 

, able data which allow a statistical test of which 
assumption is the most plausible, this note was 
written to give the results of this test and to 

) demonstrate how such tests can be performed 

•The research reported herein was performed 
pursuant to Grant No. OEG-2-700033(509) with 
the United States Department of Health, Educa- 
tion, and Welfare and the Office of Education. 



when at least four sequential measurements are 
available. 

The model employed by Wiley and Wiley j 
(1970) is shown in Fig. 1. In this model the I 
reliability of a measure (x t ) is the square of; 
the correlation (pi) between that measure and 
its underlying true score (It)* Denoting a** and 
a*« as this standardized path coefficients cor- 
responding to a n and a* respectively, path analy- 
sis indicates that the correlations generated by 
the model are: 
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FiovtE 1. A Three Wave Model 



p(XiX,)=Pta* n Py 

p(xtXf)=Pi a*»a*»* 
p(xixO=pia*»>t 

It follows from (1) that 



[Pta**]^ 



fr.aVT 



, p(xiX,)p(x,n) 

P(XiXi) 

p(x 1 x«)^(x l x f ) 

p(x«x.) 

. p(x t x,) p (i^ac) 
p(x»x.) 



(1) 

(2) 
(3) 
(4) 



Thus, without making any assumptions about 
homogeneity of error variances or reliabilities, 
it has been demonstrated that the reliability of 
Xi (p's) is identifiable, and hence also, that the 
conesponding error varianc e V(%) = V(x*) 
[l-p* t ] and true score variance V(l l )=V(x l ) 
•V(fi) is identifiable. For the two outer mea- 
sures Xi and x«, only the products [pia*n] and 
[p**«] are identifiable. 

Now consider the case in which four sequen- 
tial measurements are available. Making the 
same assumptions about the* fourth measure 
that Wiley and Wiley (1970) made about the 
first three, the model in Fig. 2 is obtained. 
Generalising the results of equations (2), (3), 
and ,(4/1 we see that in Fig. 2 : 

(a) V(t,), V(*.) f and the product [mVI 
may be identified using either x u x a , and x# or Xi, 
it and X4 

(b) p„ V(f,), V(e.), and the product [p^*J 
may be identified using either x t , Xa and X4 or 
x», Xi, and X|. 

Path analysis of Fig. 2 also indicates that 
p(x» x„) =pia*« p, and p(x t x,) = Pia*« a*» a*« p,, 
which means that a*« is overidentified. There- 
110 
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Table 1. Correlations for Quantita- 
tive (below Unities) and 
Verbal (above Unities) Test 
Scores. 3 



Grade 


.5 


7 


9 


11 


5 


1.000 


.849 


.795 


.779 


7 


.742 


1.000 


.868 


.838 


9 


.718 


.747 


1.000 


.860 


11 


.687 


.686 


.791 


1.000 



fll *L 3 

Fiourk 2. A Four-Wave Model 

fore a. = a*„ VV(*.)*V(€.) is identifiable. 
Generalizing to multiple wave* panel studies, we 
may state that, when the assumptions of the 
Wiley and Wiley structural model are given, 
error variances, true score variances, and un- 
standardized regression weights between cor- 
responding true scores are identified for all but 
the first and last measures. For this reason it 
appears unnecessary to make either the equal 
reliability or the equal error variance assump- 
tion for inner measures. However, one might 
wish to know which is the better assumption to 



a Standard deviations for quantitative 
scores are 8.986, 13.771, 16.986, and , 
17.699, respectively; standard devia- 
tions for verbal scores are 11.748, 
12,704, 13.756, and 14.379, respec- 
tively. 

make about the first and last measures in order 
tc identify the corresponding true and error 
variances and regression weights among true 
scores. Given at least four-wave data, sug- 
gestive but not conclusive evidence about which 
(if either) assumption is better may be ob- 
tained by comparing the estimated error vari- 
ances and reliabilities for the inner measures. 

The four-wave data to be analyzed using the 
model in Fig. 2 were collected in a longitudinal 
study (Anderson and Maier, 1963), which in- 



Table 2. Model Parameter Estimates and Goodness of Fit Tests. 







Estimates* 






x 2 


Fit 




Model 




h 


h IP 




a 32 


d.f. 


P 






% SCAT-V Data 












Fig. 2 

*32 m 1 
?2 " p 3 


.884 
.877 
.816 


.960 
.941 
.952 


.942 
.927 
.952 


.912 
.903 
.956 


.959 
1.000 
.959 


1.38 . 
42.61 
2.17 


1 

2 
2 


.240 
.000 
.338 


V(c 2 ) - V(c 3 ) 


.887 


.950 


.952 


.908 


.960 


12.18 


2 


.002 






SCAT-Q Data 






•W.N. 






Fig. 2 


.851 


.872 


.919 


.860 


.925 


2.78 


1 


.095 


a 32 * 1 

P 2 " *3 
V(c 2 ) - V(c 3 ) 


.823 
.557 
.851 


.840 
.899 
.873 


.899 
.899 
,.918 


.852 
.894 
.861 


1.000 
.924 
.925 


42.77 
5.40 
2.80 


2 
2 
2 


.000 
.067 
.247 



'the symbol '«*" denotes an estimate ot a pjipulati^ parameter based on sample 



data. 
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eluded a group of students tested in the 5th, 
7th, 9th and 11th grades with the School and 
College Ability Tests (SCAT), which yields a 
Quantitative (Q) and a Verbal (V) score. Table 
I gives (previously unreported) correlations and 
standard deviations on these tests for a sample 
of 703 males with complete data*. 

As Goldberger (1970) notes, the path analysis 
literature offers no guidance in systematic 
estimation of overidentified models, such as that 
depicted in Fig. 2. To obtain estimates, we used 
Jbreskog's (1970a) general method for the 
analysis of covariance structures with its as- 
sociated computer program (Jb'reskog et a/., 
,1970). The four-wave model in Fig. 2 is of the 
quasi Markov simplex type, the analysis and 
programming of which is discussed in detail by 
Jb'reskog (1970b). . Under the assumption that 
•the observed distributions^re normal (reason- 
able for these data), Jb'reskog's procedure yields 
maximum likelihood estimates of model param- 
eters and a large sample chi squared* test is com- 
puted for testing the fit of the model to the 
data. Furthermore, the program allows certain 
model parameters to be specified as equal to 
other parameters or to some constant. This is 
useful for the present problem because the chi 
square fit before imposing a restriction (e.g., 
equal error variances) can be compared to the 
chi square fit for the more restricted model as 
a measure of the tenability of that restriction. 
The analysis proceeded in four steps: 

1. The model in Fig. 2 was analyzed without 
assumptions about equal error variances or 
reliabilities. 

2. To test whether it is reasonable to believe 
that ti and are perfectly correlated, the a 
priori restriction that a*^ 1.0 was imposed. The 
chi square for this condition less the chi square 
for the first condition is the chi square with 
one degree of freedom for testing the restric- 
tion. 

..3. To test the equal reliability assumption, 
• the a priori specification was made that Pi-p* 
The chi square in this condition less the chi 
square in the first condition yields a chi square 
with one degree of freedom for this hypothesis. 
This assumption is equivalent to the assertion 
that the error variances are a fixed proportion 
of the corresponding test variances." 

4. To test the equal error variance assump- 
tion, the specification was made that>V(«,) = 
f V(«i). The chi square test of this hypothesis is 
the* difference between the ch? square for this 
condition and the one for the first condition and 
t also has one degree of freedom. . 

The results of the above analysis are shown 
in Table 2. In step one,..fbr both SCAT-V and 



SCAT-Q, the x' is small, indicating a good fit. j 
The pattern of the estimates is reasonable in | 
that ft and & are approximately equal (published | 
test reliabilities are equal and of the same order | 
of magnitude as these estimates), whereas ?j 
[fca**]. and ate lower, as expected J 

since they are the product of a reliability and a j 
true factor correlation. When the assumption 
that a*«a=l is inserted, the x 1 increased signif- .] 
icantly (>40) for both SCAT-V and SCAT-Q. j: 
The third step testing the equal reliability as- 
sumption yielded a fairly good fit, and the 
difference X 1 does not suggest that this hypoth- \ 
esis should be rejected; however [^a*<i] appears \ 
unreasonable since it is approximately equal to j 
% and fc. For SCAT-V [fca* (l ] is slightly larger 
than ft and ft, which would require a*, t to be 
greater than 1.0 for ft to equal ft and ft. In 
step 4 the difference x 1 for SCAT-V is statis- \ 
tically significant (x , i=12.18-2.38 = 10.8) al- 
though the absolute magnitude of the difference \ 
may not be too important. The step 4 results 
are more sensible^ than the step 3 results since j 
[fta*„] and [fta* (l ] are both less than ft and ft. 
The step 4 difference x 1 for SCAT-Q (like step 
3) is not statistically significant. Overall, these 
results suggest that the equal reliability assump- j 
tioh gives a good statistical fit but yields 
theoretically unreasonable results; whereas the 
equal error variance assumption may yield 
poorer fit but estimates which are like the 
original model of step 1. 

Charles E. Werts 
Karl G. Joreskog 
Robert L. Linn 
Educational Testing Service 
Princeton, N. /. 
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COMMENTS ON BOYLE'S "PATH ANALYSTS AND 
ORDINAL DATA" 1 

Boyle (1970) has made a aignificant contribution to the literature in 
showing how to use dummy variables in jjath analysis as a device for 
investigating scale characteristics. However, i? path analysis is applied 
in causal analyses without provision for unmeasured "underlying" 
variables, there is an implicit assumption that the causative variables 
are meosilred without error (i.e., perfect reliability and validity). When' 
each scale unit of an independent variable is treated as a category in 
Boyle's procedure, no measurement error corresponds to no errors of 
placement into categories. If there are placement errors, then ihe ob- 
served scale category may not correspond to the "true" scale category, 
that is, the dummy variable set used by Boyle to code the scale units for 
an independent variable would correspond to an observed set of fallible 
variables which are indicators, of an underlying set of ."true" dummy 
variables. Figure 1 illustrates the relationships among true and ob- 
served dummy variables for a four-unit scale, residual arrows correspond- 
ing to errors of placement into that scale category. The number of ob- 
served dummy variables {D at i) bt D c ) is one; less than the number of 
scale units or categories, and the true dummy variables; (T a , T&, T e ) are 
shown as nonindependent because inclusion in one category necessarily 
involves exclusion from another category. Since dummy variables are 
dichotomous, the product moment correlations among these variables 
are <t> coefficients. Application of path principles to figure 1 shows that 
the system is underidentified since there are only three correlations 
among observed variables as compared with nine unknowns (three cor- 
relations among errors, three correlations among true dummy variables, 
and three reliabilities). One solution to the underidentification problem 
is to use at least three experimentally independent indicators of the 
.independent variable, each of which has the same number of scale 
categories. For example, in the case of three independent indicators each 
of which has four levels (i.e., categories), the resulting path diagram 
would include three "observed" dummy variables (e.g., Pah A*, D a t) 
for each "true" dummy variable (e.g., D a ), the placement errors for a 
given category on one measure being independent of placement errors in 
the same or different categories for the other two measures. A path 

* The research reported herein was performed pursuant to grant no. OECW-7000 
33(509) with the United States Department of Health, Education, and Welfare and 
the Office of Education. 
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Fio. l.-Path diagram showing relationships among true and . observed dummy 
variables for a four-unit scale. 

analysis of this diagram shows that the system Is overidentAed (36 
observed correlations vs. 21 unknown correlations and path coefficiente). 
When the usual dummy variable coding is used (Decomposition II in 
Boyle's table 1), the correlation (*) between any two true dummy 
variables is a function only of the true proportion m these categories: 



(D 



where rl is the correlation between T a and T>, P. is the true propor- 
tion in category o, P» is the true proportion in category b,Q, - 1 - r a , 

^Uolbws *rom equation (1) that if the correlations among the three 
true dummy variables are identifiable, then the proportions of the true 
classification in each category may be identified. The variance of a d.- 
chotomous variable is equal to the proportion in that category times the 
proportion not in that category (e.g., V. = P.Q.), and the mean ,s 
equal to the proportion in that category (e.g., P.). The vanances and 
the correlations could be used to calculate covanances or unstandard- 
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ized regression weights as desired. A dependent variable {Y) may be 
added to the path diagram, path analysis principles again allowing us to 
find the equations for the unstandardized regression weights on each of 
the true dummy variables. When the* second type of dummy variable 
coding in Boyle's table 1 is used, the true regression weights represent 
the difference between the true Y mean of the group eoded "1" in that 
dummy variable and the true Y mean of the reference group. When 
Boyle's (1970) first type of dummy variable eoding (Decomposition I in 
table 1 of Boyle's paper) is used, then the true regression weights repre- 
sent the true difference between successive category means, that is, a 
test of the equal interval assumption under "effcet" scaling. This anal- 
ysis indicates that one of the reasons that the observed regression weights 
may differ from one seale category to the next is that the degree of mea- 
surement error may differ at different points on the seale. 

The analytical model discussed above would still apply if the observa- 
tions consisted of three independent sorts into a set of nominal categories. 
In this ease the analysis is. equivalent to an analysis of variance with fal- 
lible group information, and the problem is whether the true group means 
really differ, that is, whether the regression weights for the true dummy 
variables are all zero. 

In passing it might be noted that for overidentified models of the type 
diseussed above, a procedure for estimating the parameters of the 
model is needed. As Goldberger (1970, p. 25) notes: "the path anal- 
ysis literature offers no guidance on systematic estimation of over- 
identified models." Beeause the distribution of variables (true and 
observed) is multinomial, the function to be minimized (Mote and An- 
derson 1965; Coehran 1968, pp. 647-18) for estimation purposes is a x* 
involving observed and hypothetical ("expected") probabilities. The 
dummy variable path analysis equations therefore must be translated 
into probability functions to obtain estimates in overidentified models. 
In our opinion, path analysis is useful in this type of problem beeause it 
helps deal with questions of identifiability, and it is easier for the re- 
searcher to eoneeptualize the relationships among variables. 

Charles E. Werts 
Robert L. Linn 

Educational Testing Service 
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Werts & Linn (1971) pointed out that Boyle (1970) had implicitly 
assumed that the causative variables were measured without error. Further ^ j 
study of literature relating to this problem (e.g. •„ Cochran, 1968; Evans, 1970; 
Anderson; 1959) indicates that the Werts-Linn procedure for dealing with "categorical 
errors of measurement is incorrect. The purpose of this note is to set the 
record straight. i <r ^ h ' 

As a basis for generalization to polychotomous variables, first 

consider the case of three independent fallible dichotomous measures Xj 
(j = 1, 2, 3) of an underlyirig true dichotomy (T) . The observed categories 
will be labelled k = 1, 2 and the true categories 4 = 1, 2. The relationship 
between X, and T can be expressed as a function of ft he conditional 
probabilities P{X, - k|T = H}= 6 f° r each combination of k andJlp 

• P{X j -l|T =1} = 6 jn , P{X. -l|T = 2} = B Ji2 , 
^{Xj = 2|T = 1} = 6^', and-PCXj - 2|T - 2} = 9^ . 



9 is commonly labelled the proportion offals e negatives and 8. the 
proportion of false positives. The sum of the conditional probabilities for 
a fixed value of I is unity i.e., 9^ + Bj = 1 and 9^ + 0 j 22 = 1§ 

k 

Define P T = P{T = U and P = P{X « k} where I = 1 and Z L 

* k 1 : • 

model parameters to be estimated are the conditional probabilities 
for each observed measure and the true proportions in each category. Since ea 



The research reported herein was performed pursuant to Grant No. 
OEG-2-700033.(509) with the United States Department* of Health, Education, 
and Welfare and the Office of Education. 
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object is categorized by each m different measure, the proportion o'f 
objects for each combination of observed categories can be computed. 
Define P jkfJ i k i fJ . V . - p < X j = « X j • = k ' • X j" = *"> where - J + J f t J". 

: • 

In the three measure case the observed data consist of eight joint 

probabilities Pu f 21,31, p ll,'21,32, P ll,22, 31, P l 1 ,22 ,32, p 12 ,21 ,31 , 
p 12,21,32, p 12,22,31, and P 12 ,22 , 32 • The next step is to relate these 

observed probabilities to the model parameters. Starting with Pn,21,31 

■ '■.■*' 

we obtain: / 

£(Pll,21,3i) " p {X! = 1, X 2 - 1, X 3 =1, T - 1} +P{X! = 1, X 2 - 1, X 3 = 1,T =2} 
Expressed'in terms jtf conditional probabilities the proportions are 
P{X 1 =1,X 2 =1,X 3 =1,T=1} = P{X 1 =l|X2=l,X 3 =l,T=l}P{X 2 =l|X3= 1,T=1}P{X 3 =1 1 T=1}P{T=1}, and 

P{X a =l,X 2= l,X 3 -l,T=2} = P{X 1 =1|X 2 =1,X 3 =1,T=2}P{X 2 =1|X 3 =l,T=2}P{X 3 »l|T=2}P{T=2}. 
o The assumption that the measures are independent implies that 

p{Xx = i|x 2 = i, x 3 - i,*t = 1} = p{Xx = i|t = 1} ■- e m , 
p{x 2 = i|x 3 = i, t = i) ■- p.{x 2 = i|t = 1} = e 211 , . 

P{Xj = 1|X 2 = 1, X 3 = 1, T = 2} - P{Xj = i|T = 2} = 0 112 , and - 
,P{X 2 = l|X 3 = 1, T = 2} = P{X 2 - l|T m 2} = 6 212 . . . 

Thus, by substitution: 

fel.,21,3l) = .0111 0211.03^1 P Tl + 0112 0212 0312 P T 2 * ^ ■ ■ (1) 

While this process could be repeated for each of the observed joint , . m 

probabilities,, for identification purposes it is better to. replace these by 
the following set! 



-3- 



P ll,21 " P l 1 ,21 , 31 + p ll,21,32 , 
P ll,3l' = P ll,21,31 + p ll,22 
p 21 ,31 =" p ll,21,31 + P 12,21 



p ll " p ll,21,31 + p ll,21 

■as, 

." p 21 ** p ll,21 ,31 + p ll,21 
* p 31 = p 11.21.31 + P 11.22 



31 » 
31 » 



32 +* p ll.,22,31 + p ll,22 ,32 > • 



32 + P 12, 21,31 + p 12,21,32 » 311(1 

31 + ; P 12, 21,31 • + P L2, 22,31 • 
Following the procedure used for Pn 21 31 it: ma y be snown that: 
£ ( p ll,2l) = 9lll e 211 p Tl + 8112 9212 p Tz , < 2 > 

r . 

6( p u,3i) "em e 3n p Tj + Buz e 3 i2 p T2 , ' - (3) 

£( p 21,3l) = 9211 9311. P Tl + 9212 9312 P T2 , . < 4 > 



ERIC 



' 6 ( p n) = ©111 p Tl + 9112 P T2 » 
£ ( p 2l) = 9 2 n P Ti + 6 2 i2 P Ta , and 

£ < p 3l) ■■ 9 3 11. P Tl + 9 3 12 P T; ••• 



(5) 
(6) 

(7) 



Note that even though we started with eight joint probabilities, we have only 
seven equations because of the condition that all the observed probabilities sum 
to unity. If the model parameters are identifiable then it should be possible 
to solve these- equations for each" parameter in terms of the expected 
probabilities. For this purpose it is convenient to define: " 

C jk,j'k',j"k" sr ^ (P jk,j»k',j"k"^ l5(P jk )] C j'k',j"k" " ,g(P j'k' ) ^ C jk,j ,, k" 

. - [€(P rk „)] C jk>j , k , -§(P jk ) g (P 3 , k ,) "'dOPji-) . and = 1 - P T& . 
For the dichotomous case Q„ - P_, . Solving .equations 1 through 7 for P T 



we obtain: 



. 2 -2 

Q Ti " P Ti Cn >2 l,31 



■if ^^Ti 



All, 21 Cli,3i C 2 1 ,3V3 



(8) 



Equation (8) shows that P_ and P_ = 1 - P- are identified* Further 

T l T 2 ™ 

analysis yields: . 



(9) 



and 6 



V c 21i 3i 
Cffn) - / C "- 2 ' ° 21 jfM do) 

f 0 

^.^y/'JlSL^ /V] ox) 
V Cn,2i ^ Q Tl / . 



'212 



.1 / 



0 

ERIC 



Since P_ and P are identified, equations (9), (10), and (11) shotr that 
e 112> e 212t e 312 and therefore 6 12 2 = 1 - B 112l 8 22 2f ■ 1 - 0212> and 6322 = 

H 

1 " 9 312 are identifiable* Given P j, P_ , 6n2,_0212> and,0 31 2 identified, 

equations (5), (6), and (7) show that Quit 6211 and 6 311 and therefore 

e 121 " 1 P221> B 1 - 0 211t anc * 0 321 = 1— 0 311 are identifiable. 

Since the model consists of seven equations in seven unknowns (i.e., just 

identified), parameter estimates can be obtained which will exactly .. ;! ■ 

reproduce the observed probabilities, i.e., the observed joint, probabilities 

would equal the expected joint probabilities estimated from the parameter 

estimates. The above analysis shows .that- the true proportions may be identified given 
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three independent dichotomous measures, a point which Werts & Linn (1971). 

failed to discover. The right side numerator of equation (8) is the expected value 

of the triple covariance between X a , X 2 , and X 3 ; which is the crucial 

piece of information neglected in the Werts-Linn path approach. Furthermore, 

path analysis usually ignores variable means, which would result in neglect of 

equations (9) , (10) , and (11) which involve means (P Jk )« \ 



Next consider the trichotomous case in which k = 1,2,3, £=1,2,3 and j=l,2,3 
given the assumption of independent measures. The relationship between the 
j th observed trichotomy and the true trichotomy involves nine' conditional 
probabilities: 9* n , 9* 12 , 9* 21 and 9* 22 as ' defined previously plus 

p{x^= i|t ='3} = 9* 13 , ^ pAj -' 2|r- 3} = e* 23 , 

P{ Xj = 3|T l '= 1} = 6* 31 , P{X j = 3|T = 2} = 9* 32 , and P{ Xj = 3|T -3} « 9* 3r 

By definition: 9* n , 9* 21 , + 9* n * e* 12 + 6* n + 9* 32 = 9 jl3 + 9 j23 + 0 j33 = 1. 

Let K = total number of categories and J = total number of indpendent measures. 

11 j 

J iThe observed data consist of the K =27 joint triple probabilities P lk)2 k',3k"' 

• 1 ... 1 . ■ ' 

one of which may be expressed as a function. of the other 26. 

There are JK = 27 9 ..., JK of which can be stated as a '.function of the 

ic 

others because for a fixed I the 9.. 0 sum to unity and K =3 P 
one of which it can be stated as 1 mlnut; the sum of the others... 



Therefore there are a total of JlC(K-l) + (K-l) = 20 independent parameters to be 
estimated from the K - 1 = 26 independent observed joint probabilities, 
i.e., the model has six overidentifying restrictions. This does not 

necessarily mean that all parameters are identified and iiT principle the 

\ ff • ■ ■ 

expected value of each P . , '„.„ should be derived as done previously 

and' the equations solved for each parameter. Rather than attempt this directly, 

it can be seen ^th^ if category three were collapsed into category 2 then 

the analysis would be identical to that <shown *f or dichotomous variables. 

The relationships would be (*refers to probabilities prior to collapsing categories) 

P s p* " l 1 . 12a 

6=6* 12b 
• . jll jll 

■ 1 : ■ 

From our previous analysis we know that the parameters in the right side 
of equations 12a, b, & c can be identified from 

* -. 

Pll ,21 ,31 = p l 1,21.,31 f ~~ ■ 

- .. . * • - . I J .*=* 

•*ll',21,32 " Pll,21,32 + ?11,21,33> 

22,31 = p l 1 ,22,31 + P ll ,23 ,31 

. * * . * * 

p ll,22,32= p ll,22,32 +Pll,23,32 + ( p ll , 22 , 3 3 + *1 1 , 23 , 33 , ' 

Pl2,21,31 = Pl2,21,31+ P 13,21,31, 

P?2 21 32 a ?12, 21,32, + *12, 21,33 + ■ * 1.3 -21, 32 + P 13,21 ,33, ' , 



* * * * A 

P 12,22,31 = P 12,22,31 + P 12,23,31 + P 13,22,31 + P 13,23,31» and 

* * * * * 
P 12,22,32 " P 12,22,32 '+ P 12,22,33 + P .12,23 ,32 + p 12 ,23 ,33 p 13 ,22,32 + 

£ £ ft 

p 13.22,33 + P 13,23,32 + p 13,23,33' 

These eight P., . ., tl ii could be entered into the analysis shown for 
jk,j k ,j k 

dichotomies and the corresponding parameters in 12a, b, & c identified. 
In a similar fashion if we collapse category 1 into 3 then: 



* 

*T 2 P T 2 ' 



9 j22 = e j22 . »" 



12d 
12e 



and e j23 d-p l2 ) =| f e* 21 p* a + e* 23 . \ i2f 

The right hand parameters in 12d,e, & f would be identitted from: 



} ' 



'12,22,32 =P 12,22,32, 



P 12,22,33 = P 12 y 22,3I +P 12,22,33> 

j * « 

p 12,23,32 " P 12,23,32 X P 12,21,32 j 

* * * * 

P 12,23,33 M P 12,23,33 + P 12,21,33 + P 12,23,31 + P 12,21,31> 



P 13,22,32 = P 13,22,32 + p ll,22,32, 

* ** * " . «* 

p 13,22,33 ■ p 13,22,33 + p 13,22,31 + p ll,22,33 + p l 1,22, 31, 

* * * * - 
P 13,23,32 55 p 13 , 23 ,32 + fl3,21,32 + P ll,23,32 + P l 1 ,21 , 32 ?^ 

1 ft ft ft ft ft 

p 13,23,33 = P 13,23,33 + p 13,23,31 + P 13,21,33 + P 13,21,31 p ll, 23,33 + 

ft ft ft 

P ll,23, 31 +p ll, 21,33 + p ll f 21 f 31 • \ • 

These eight P.. 4t , , could likewise be entered into the analysis for 

JR,J k >J k . f - ... j: , / 

dichotomies^yhere the two categories are k = 2,3 instead of k = i»2 as 

*-l ft ■ ft ft 

shown in our original analysis. ]|e can conclude that P^, P^, and 
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are identified from 12a,d. and Q* llt 8*u, e* n , 6* 2 2» 6*22» and e 322 
from equations 12b, e . The remaining 12 parameters in equations 12c and f 

have six conditions imposed by equations 12c, f so we need six more 
equations for identification. The simplest set, which is independent of 
information used in the dichotomous analyses is: 



p ll,22 = 


* • . ; «* 

p ll,22,31 + p ll,22,32 


* 

+ p ll,22,33 ». 


* 

P ll,32 = 


* . t,* 
p ll,21,32 + P ll,22,32 


* 

+ p ll ,23,32 ' 


?12,21 " 


* . «* 
p 12,21 ,31 + P 12,21,32 


* 

+ p 12,21 ,33 ' 


p 12,31 ™ 


* . r,* 
p 12,21,31 + P 12,22,31 


* 

+ p 12,23 ,31 » 


p 21,32 " 


Pjl,21,32 + p 12,21,32 


* 

+ p 13,21 ,32 » 


and P 2 2,31. = 


p ll,22,31 + p 12,22,31 


* 

+ p 13,22,31 ' 



Application of the procedure used to derive equation (1) yields: 

€ (P?i f ?2) = e?iie*2i p Tl + e?i2e*22 p .* 2 + 8*138*23^3 ? 
%.( p ?i,32) - .eme$2i*i, + e *i2 e *22 p *. 2 + e n3e*23 p T3 » 
g(P*2,2'r) = 8*218*11**^ + ^izzptu^ +• 8*238213^3 ,, 
e( p ?2,3i) = •taje.JnPjj +. 'l^iwPjj +.8*238 31 s**^ 
S( p *i,32) = 82118*21*^ + 8*128322^2 .+ -8*138*23^3 . 

and g(P*2,3l) = 82218*11^ + 82228312^ + 8*23.8*1 3^3 



Equations (13) in combination with previously identified parameters 
and equations (11) and ' (12c, f )identify the remaining parameters. Note 
that six equations have not been used, these representing the six degrees 
of overidentification. The method which appears appropriate for estimating 
parameters when the observed variables are independent' polychotomous 
measures is discussed in Anderson (1959, sec. 3.6) and Cochran (1968, 
sec. 6). In this procedure a chi square function. involving the observed 

and' estimated expected joint probabilities is minimized as a function of 

2 

the model parameters. The resulting X with degrees of freedom equal to 

the number of overidentifying restrictions , is a measure o£ th^ fit of 

the model to the data. Our analysis indicates that given^three independent 

polychotomies (K = 3) all model parameters are identifiable. The number 

of overidentifying restrictions is equal to (K J - 1) - (JK + 1)(K - 1) 

where J* = the number of independent measures and K = .the number of categories, 

We may now consider exactly why the Werts-Linn analysis was 
inappropriate to the problem. For this purpose it is helpful to put the 
conditional probabilities into matrix, form where columns refer to observed 
categories and rows to true categories: 
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jll 


* 

9 /j21 


* 

9 j31 


* 

jl2 


9 j22 


* 

9 j32 


J13 


• 

* 

9 j23 


* 

9 j33 



(14) 



As noted earlier each row sums to unity. If the true categories 1, 

2, and 3 actually form an ordered set of classifications such that category 

1 is "closer" to 2 than to 3 then we would expect that classificatory errors 

would be more likely for neighboring categories, i.e., 0j 12 > and 

9 > 8 JM . In contrast, if the true categories are basically unordered, 
j32 jjl 

it would be more reasonable to expect the likelihood of misclassif ication 

* A# * * #v * 

to be similar for any of the other classes, i.e., ®j^2 = ®jl3* ^j21 ~ ^j23* 

and 0. o , 8 In other words the probability of misclassification is a 

j31 J33 

function of the underlying scale or "true 11 category in the case of ordered 
categories and is not in the case of Unordered categories. Werts & Linn 
implicitly assumed that the errors for one category were uncorrelated with 
the underlying "true" dummy variable for the same and for other categories 
which translated into the present framework corresponds to the analysis for 
an unordered scale i.e., for an ordered scale the errors would be correlated 
with the "true" dummy variables for other categories. It can be algebraically 
shown that the Werts-Linn procedure leads to incorrect formulae for the 



expected! value of the observed joint probabilities when the categories are 

orderejiJJj Since Boyle (1970) was examining the problem of ordered categories 
(i.e., scales) the Werts-Linn. approach is not relevant to his problem,. 
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Another perspective on "Linear regression,, structural 
relations > and measurement error. 11 
Charles E. Werts, Robert L. Linn, and 
Karl G. jfcreskog 



Abstract 



A stochastic disturbance term appears to be essential for structural 
models in the social sciences. The analysis of such models is considered 
from the perspective of jtfreskog's (1970) general model for the analysis 
of covariance structure. 
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Another perspective on "Linear regression, structural relations, and 
measurement error, 11 



Charles E. Werts, Robert L. Linn, and | 

it f 
Karl G. Joreskog 
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i 



Isaac (1970) has performed a useful seryice in dispelling the common 

• / * > . - 

misconception that parameters estimated in a regression analysis are necessarily 

those involved in a structural relation. Researchers who Would use the formulae | 

• • • " j .... . • .. . . m 3 

- supplied by Isaac should be warned, however, that these apply to a model which 

is seldom, if ever, relevant".. Johnston (1963, pg. 148) notes that this model | 

• . : • : ; .\ .. : ' .1 

"hardly seems' appropriate for econometric work, since, if it were true, the \ 

only reason , for points not lying exactly' on a straight line would be errors of j 

observation. . A stochastic component of behavior would seem an essential itf. 

economics." This, comment applies equally to psychology^ in ^ich .the usual 

type of relationship' is like that between fathers and sons height, where even 

if there were no errors of measurement the conflation would be less than 

perfect. Adding a stochastic disturbance term, |* , the model becomes Y = a + £ X 

Rather than review the analysis of this model, which is covered by Johnston 

..(1963, Chap. 6), we propose to consider the problem from the perspective of 

Joreskog' s (1970) general model for the analysis of covariance structures. 

. Joreskog jl970 , pg. 239) .considers 

. : ' a d^ 1 

■ followihg model. ;Rows : 6f X are independently; distrib^ - — 

normal distribution with the same yuviance-covariahce matrix 2 of the form 

'^■[. : ^ : ; V }-0J .^••V.:'; Z = B(A*A'+^)B' + 0 2 , ' - ; W . -* ■ 

and mean vectors gi^eti by •' * j- " =;-AEP, -'^ ■ : • b " i*) '.[■'■ 

where A - ((Q is ah X x "g matrix of rank g ind P = {p t ) is a ll *? matrix of tm * h ' both 



Meane. variance, and covariances are structured in terms of the parameters 

• J E ,B.A.*.«r.nd a may be (.) "fixed" parameters 

th.t have been assigned given values, (b) "constrained., parameters that are 
unknown but ea.ua* to one or more other parameters, and CO 'W parameters 

that are unknown and unconstrained. . ^ 

Por analytical purposes^ "S start with a stochastic disturbance term 

errora^f measurement as given b, Uaac (1970, «. 2U>. in this mode! 

the served varllbles dower case letters, 

„ + c . in this problem tbe ,uestion of means is not Important (since . 

can be estimated from tbe , estimate, l-J-i 9 - - ^ * 

- t1ie var iance-covariance matrix of the observed 
considering the structure of the variance 

/„ the factors are. (X, V, e » e .J » 

variables. The observed vector is (y, x) , the tac . . y x 

B is an identity matrix, ¥>d 0 = 0, 



A = 



1 

0 



and <fc - 



o 

2 
V 
0 



' 1 
0 

0 



o 
i 

, 0 
0 
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Since there are 2x34-2 distinct elements in Sand five free parameters, 
this model is underidentified by 2 restrictions (d.f. = 3-5). If the error 
variances<f7 andtTc were known apriori (possibly computed from known 

y x 

reliabilities for measures), then the model would be just identified and .the 

associated computer program (Joreskog, Gruvacs, and von Thillo, 1970) could 

be used to obtain maximum likelihood estimates of parameters. Because 

the model is just identified, the estimated elements of Z would exactly 

equal corresponding elements in the observed variance-covariance matrix. 

Isaac's model involves the deletion of y, i.e., the second column in A and 

* , in which case there are still 3 distinct elements in I but the ntimber 

f 2 2 2 

of free parameters Jh as been reduced to four (3, (C v , rS~ , (J* ) so that 

r .' " / " ^ y x 

only one additional assumption is needed for identification. If as in 

Isaac's cas^i, <S^ is known, then' all parameters are identified. When 

the . ratio of the error variances X is known (Isaac's case //3) then Y and 0 = 

as before, but now: 

1 0 
0 1 



B 



0 
1 





1 


o 


0 


A 


0 




0 




■ o 


o 


, 1 ' 


* ' 

and '. ■ 


0 


9 . 


0 ' 

. o 


0 




■ X 



J 
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This model has two elements in * constrained to be equal and two free parameters 
(sine'/ V" 1 is- "fixed") and the model is just identified. Isaac's fourth case, 
in which C and <f are both known, is of interest because when these are 
inserted in * the model has one overidentifying restriction. Assuming that the 
observed distributions are -normal, a chi-square with one degree of freedom is > 
generated which tests the fit of the model to the data. In general, Isaac's 
equation (3) is not the maximum likelihood solution for this overidentified 
model; this difference arising because equation (3) uses only, the ratio of 
• the error variances, neglecting the' absolute values. 

-Because most effects have multiple causes, it is of interest to consider i 
the case of an exact functional model in which there are only three variables , 
X, Y; and Z and causation may occur in any direction. With any variable held 
constant the true correlation between the other two is perfect, i.e., the 
true partial correlation between any two variables with the third controlled-^ 
is unity. ' However, the partial correlation is equal to the product of the 
two. corresponding partial regression weights, e.g^, /^ Y .z = ^XY.Z ^YX.Z = 
in this model. Therefore, the partial regression weight in one direction is the 
inverse of that in the opposite direction with the; same variable controlled, e.g./ 

= U& 7 . In the model Y = o + /5.X.+ /»--, the stochastic term 
XY i Z YX • 

represents the effects of all other influences, which are assumed, to be independent 
of X. The partial correlation v^y „ v is equal to : , 



(> ,/...• 

Therefore the reciprocal relationship /^y^ s ^X.^ holds in the 
stochastic disturbance term model. Since/* is independent of X, ^yx 1 

but of course since/^ is not independent of Y this relationship holds only for 
Y on X. ^ . 

A variety of other solutions to the identification .problem may be used 
instead of or in combination with thos^ discussed by Isaac. For example, if a 
"congeneric" measure (Joreskdg, 1970, sec. 2.2) Xj of X (Xj = ^ X J( x + 
were added to the model with the stochastic disturbance term, then /9 % ^ r ^\ « 

<f v , \f6 2 , ^f? , and the sum of ^2 + would be identificd - The * 

. x # x x y 

classic psychometric assumption of equal reliability means that the error 
variances are proportional to the true variance^, e.g., if the reliability of x 



2 



and y were equal then X = /<tfE = <T l •" This e 1 ual reliability 

assumption in. combination with the congeneric measure^of X would identify 
and^ e 2 separately. In 'principle. this congeneric measure serves much the 
same purpose as the pconoraetrician' s use of an "instrumental variable" 
"(Johnston, 1963, sec. 6.5) i.e. ,. a. variable which is independent of the 
measurement errors c and * • For example , if an instrumental variable z 

x y 

were available for Isaac's model, the observed vector would be. (y, x, z), the 

factors are (X, Z, 6 ,6 ), B = an identity 'matrix, if mid 0= .0, " 

y x . .. 

/& 0 1 0 

1 ". 0 0 1 



A = 



a i U o o 
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and 4> 



< 



2 
X 

XZ 
0 



'XZ 



For convenience a factor Z has been defined identical to z ( 
however we could have considered the model .z = Z + « which would have 
identified the parameters jS <$'\ t V^' Mld C XZ but not * 

J6reskog's general model thus allows the analyst considerable flexibility in his 
choice of econometric and/or psychometric procedures for dealing with errors^of 

measurement. . . / 

In summary, we recommend use of Joreskog : s general moaei oecause : t*U ' iv - 
It is. unnecessary to have estimating formulae for each special case, especially 
since such formulae do not apply to overidentif ied models, (b) Attention is 
focussed on the problem of identification which is prerequisite to any understanding 
of the results, (c) Given multivariate normality of observed variables , a ■ chi- 
squared goodness of fit test is available. If. for example, in Isaac's ca'se #4 
we wished to test the hypothesis that 7$ was a given value, then the increase in. 
X* (with one degree of. freedom) resulting from changing jff- to a .fixed parameter, 
is a test of the tenability.of this hypothesis, (d) A variety of assumptions may 
be used singly or in combination, Jso that whatever information is avail able may 
be incorporated .hopefully achieving an overidentif ied model which can .-be tested 



i 



i 



Footnote - j .3 



1 

^Tie estimating formula for $ in case #3, given by equation (3) in 
Isaac (1970, pg. 215) has a X left out of the denominator. Kendall and Stuart (1961) 
recommend that the positive root be used. Johnston (1963, pg. 154), however, 
recommends that the positive root should be used when cov (x,y) is positive 
and the negative root when Cov(x,y) is negative. 

+ r 
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A CONGENERIC MODEL FOR PLATONIC TRUE SCORES 
Charles E. Werts, Robert L. Linn, and Karl jflreskog 
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Abstract 



To resolve a recent controversy, between Klein and Cleary and Levy, 
a model for dichotomous congeneric items is presented which has mean 
errors of zero, dichotomous true scores that are uncorrelated with errors, 
and errors that are mutually uncorrelated. 
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A CONGENERIC MODEL. FOR PLATONIC TRUE SCORES 1 
- Charles E. Werts, Robert L. Linn, and Karl jtfreskog 

In a discussion of platonic true scores, Klein and Cleary (1967) state 
that^the use of platonic true scores makes the assumptions of classical test 
theory generally untenable. They illustrate their argument with dichotomous 
items and a dichotomous true score and show that: "The classical test 
theory formulation c£ ■■0? + c£ , can only be true if the mean, error is 
not zero" (Klein & Cleary, 1967, p. 78). This statement is based on the ..j 
following definitions of observed (x), true (T), and error \(E) scores: 

. |l if phenomenon is present ) 
, {0 if phenomenon is absent 7 



( 1 if phenomenon is rated as present 



!• 



I 0 if phenomenon is rated as absent 

. ■■ ■ . • . ». 

and E = X - T . Klein and Cleary go on* to consider |two parallel dichotomous 
items, X n and-*X p , and show that the ^©variance between E 1 and Eg is 
positive when the errors, and- E 2 , have zero means. With correlated 

error scores, the correlation between' two parallel items overestimates the 
item reliabilities. In response, Levy. (1969) argued that the classical 
assumptions can be shown to hold ,for- a dichotomous item if = 

I a if phenomenon is rated as present | 
bj if phenomenon is rated, as absent J .. ., 

true scores (T) are defined as above and E = X - T as before. This, modifi- 
• c^ion will indeed make it possible for the mean error to be zero .and the/ 

coyariance between -"T arid ; E to be - zero.' As Klein and Cleary (1969 j/; note, 
j: however) ; Levy -does riot provide ; a means' of solving for "a" arid' V' without 



knowledge of T . In any practical application, T would be unknown and 
therefore f, a ff and f V f would be unknown. Also, no way of obtaining item 
reliabilities is presented. The purpose of this paper is to provide an 
alternative formulation which allows for the model 1 parameters to be deter- 
mined given the structural specification of zero mean error and no corre- 
lation among errors for different items or between errors and true scores. 

Our approach is drawn from latent structure analysis (Anderson, 1959) for 

i • ■ 

ibhe special case of dichotomous latent variables. 

I> A Congeneric Model for Dichotomous Item s 

The equation for congeneric tests is given by. jfireskog (1968, 1970, 
1971) as 

where is the true score for person i , 

""X. ./ is the observed score on item j for person i , 



is the slope of the X^ on regression line, 

I is the intercept of this regression line, and 
E„ is the error for person- i on item j • ■■ 
To illustrate the application of this definition to the case in which^X. , 
> and are both dichotomous (scored 1, 0), consider the case of three 

items, which is the minimum iiumber of items required to identify model 
parameters uniquely, given experimentally; independent measures . The 
equations are ♦ ... 



2 

where the E f s are mutually uncorrelated and are uncorrelated with T . 
In the case of dichotomous variables 



P(X. = 1, T = 1 ) - P(X. = 1)P(T = 1) • , 

-J pH.i'U.oJ - p1x J ■ 1|T ■ 11 - P(X J " 1|T 



and 



I = p(x. = 1) - B. m P{T = l) = P(x. = l|T = 0) . 

This model is somewhat more complicated than ttfeTmodel considered by 
Klein and Clear y (1967) where X = T + E with X , T , and E all taking 
values of O.or 1. In essence, the congeneric model is equivalent to the 
model suggested by Levy (1969) if his "a" and V are allowed to vary 
from item to item. For a given item, "e " would equal (l - I.)/B.,p , 

J J u.- 

"b." would equal -I./B.- , and Levy's error would equal the error of 
equations 1, 2„or 3 divided by Bj T • To illustrate the. point that the 
congeneric model does allow for the traditional psychometric assumptions 
in the dichotomous case; consider the following, example constructed using 
the equations provided by Anderson (l959> sec. 2.U). 

1. The 0. (proportion of false negatives, i.e*. / p(x. = o|T = l) = 

j .J 

p(x. = 0,T = l) *■ P T ), <& . . (proportion of false positives, i.e., 
N p(x ='l|T = 0) = P(X. = 1,T = 0) * (1 - P>)) 9 and P T (the true pro- 
' portion, P(T = are given as: 

.; e 1 =. .50 ., e 2 * to , e 5 = .10 . ; . 

: •♦ = .10 :<J> 2 = .50 , :«> 5 = .50, 5 , 

: ' v;;' ; >: - p^/i ^<r.i ^ = 1 p t = .ko . ; 

/ 2. The expected marginal distributions (p. = Prob . [x. = ,1 ) ). are 



3. The expected joint probabilities for pairs of items, P.. f 



Prob [X. = 1, Xy = 1) = (1 - 0 )(1 - 0 jf )P T ^Yj.Q,,' (J k A') are: 
P 12 = .272 , P 13 = .390 , and = .?>Qk. 

k. The expected joint probability for three items, P. 



Prob (X, = 1, X,,=. 1, X,, , = l) = (1 - 9 )(l -.. 9 , )(1 - 9 , ,)P +>.«., f.,,0-, 



(j\ j' 4 J'!) is P 123 = -2328 . 

■ , ■ 5* The regression weights (B^ = 1 - 0^ - 0...) are « .60 , 

6* The intercepts (i. = P. , J , B.-P- = are I- = .10 , I = .50 , 

and I x = .30 . The possible gverits (for combinations of the three items and 
the proportion of people in each" event are shown in Table 1. The means of 
the errors are zero, the true score is uncorrelated with the errors and the 
errors are uncorrelated with each other. 



Insert Table 3; about here 



II.- Identification j 

In an actual problem the situation would be reversed firom the example 
shown in section I, i.e., the probabilities P^ Pg, Py P 12 , P^y P^y 
and P 10 t: correspond to observed scores, and it would be desirable^ to 

•• 1 ■ • ■/ : • y y - . - ■ ! 



identify the seven parameters, 9^, 9^, 9y ?2'' : *3' 811(1 P T # 
principle, one could solve the seven equations for this purpose; 



In 



p i = ^ " 9 i )p t + \9t ■ - (2a) 

Pg .> (i * e 2 )p T + ^ > / (2b J 



:: P 3 : 



(l - :9 3 )p t t yy : . • , y jt 2c > 



P 12 = (1 - 


9^(1 - 


e 2 )P T + 




P 13 « (1 - 


9j)(l - 


e 5 )P T + 




P 25 , = (1- 


e 2 )d - 


6 3 )P T + 




P 123 - (1 - 


9^(1 - 


e 2 )d - 


0 3 )p t + ♦iVjSt 



(2d);> — . 

(2e) 
(2f) 
(2g). 

The solution to these equations is facilitated by noting that in the congener I 
model the expected covariance (C..,) between two itetms is given by 

J J . 

where V T is the variance of T . Translating into .probabilities: 

( V 'TV' = (1 " V V - V ,p tV • (» j 

This means that 



C 12 


= P 12- 


P P 

12 


= (i 


- e l - 




' 9 2 " 


*2^ P T Q T ' 




°13 


= P 13- 


P P 

13 


= a 


- e l - 




" V 


O^P^ , 


(M>) 


C 23 


^23 


p P 


= (i 


- 9 2 - 


♦ 2 )(1 • 






frc) 


These equations 


can be 


solved for 














(1 - 


e i- 


V 2 


p tSt = 


C 12 C 13 

C 23 




'A. ' 


(5a) 




a- 


\ 2 " 


* 2 > 2 


PA = 

n 


C 12 C 23 
,. C 15 






(5b) 




(i - 


V 


S) 2 


PA = 


C 13 C 23 
C 12 


■ b It p 


tS? . * 


(5c) 



m S7 



The tripie covariance ^ is defined (Boudon) ^ p _ ^ ^ ^ 
expectation of the products of the deviation of aia three varices 
^simultaneously, which is equal in the dichotomous case to C - P 

- v 3 ) - > a * u - V3 ) - V p m . Pi p e , . PiV3 , 123 123 m 

Wng options ( 2a , b, =, «, e, f ) equation (6) ma y be translated to 
°125 = VstVtV^ " p i) ^ from equations ( 5 a, b, <•) we obtain 



123 = • (7) 



Applying these equations to our example, - ' 

1. ' Compute covariances by equations (l*a, b, c): * 

'^ C 12 = '° lkk > C 13 " -° 8 ^ , and = .01^. 

2. Using equation (6) compute = -.001723 . 

3. Prom equation (7), 

. - VC 12 C 13 C 23 
Solving for P T = 1 - 0^ we obtain P T = .60 
5. From equations ( 5 a, b,c) and substituting -in this value of P 

T 9 

= .60 , 

B 2T = - 10 > 
= .60 . 

6. It eah-oe shown (equations: 2a, b, c) that * = p _ B _p npT . 
mitting calculation of o ; = i # . 
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l 2 = * 2 = .50 
ij = * 3 = -30 



7. Since 6 . = 1 - B - <t> 



0 1 = .50 , 
e 2 = .IK) , 



0 5 .10 



8. Item reliabilities R . . are R. = B 2 P 0 /? 0 ie 

33 33 jT W * 1,e *' 

R 1:L = .5^78 , 
R 22 = -0097 , 
R 55 = .5850 . 

In the case of three congeneric items the model parameters are just 
identified, i.e., there are seven equations in seven unknowns, which is the 
reason that the parameters may be obtained as an exact function of the 
observed probabilities. In the case of over identified models one of the 
estimating procedures discussed by Anderson (1959) can be used. One 
procedure minimizes' a X 2 function of the observed probabilities (P Q ) and 
the expected probabilities (P E ) generated as a function of the parameter 
estimates (Cochran, 1968; Mote & Anderson, 1965). In the general case of 
J. items there will be ( 2 J - l) independent observed probabilities in the 
cross-tabulation table from which (2J + l) parameters are to be estimated. 
In the special case of two items of equal accuracy the reliability is the 
correlation between these items, but the model parameters .cannot be identified 



(Cochran, 1968, sec. 6) since P_{x. = 1, X„ - 0] * P_(x. = 0, X. = l) , 

tj 1 d hj X d 

i.e., there are only two independent probabilities to estimate three 
parameters (0,<i>,P T ) . 

III . Variations 

It is sometimes the case that three items with errors that are uncorre- 
labed with true scores or errors of other items are available but one of 
these measures another variable, i.e., 

X x = +_I r + \ > 

X 2 = Vl + J 2 + E 2 ' (8) 



X.^ = B,T 0 f L + E, • 
5 5 2 3 3 ^ 



In econometrics X* is called an "instrumental" variable (Johnston, 1963; 

3 

p. 165). The equation for X^ can be transformed into 

X = B*T_ + I* + E* , (8a) 
3. 3 1 3 5 

where 
* 

B* is identified but B^ T and B^' are not. In the case of dichotomous 

variables, therefore, the true proportion P T maybe estimated as shown 

1 

in section II by treating X^ as a congeneric measure of and 

B* = (1 - 0 5 - 0 5 )(1 - e T - ♦ ) , where 9^ = PfT^ - olT^-j l)' and 

o> T = PfTg = ll^ = O) . The validity of such an analysis is dependent on 
the correctness of the independence assumption. 

The above analysis can be extended to the case of four items with 
mutually uncorrelated errors and no correlation between error and true 
scores, two of each measuring different variables: 



X 1 =B 1 T 1+ I 1+ E 1 , 



X 2 = .¥l +I 2. + E 2 ' 



h = B 3 T 2 + I 5 + E 5 



X, = Bi.T^ + I, + E, 



(9) 



\ ■ 

Following the above line of reasoning all parameters in this model (P , 

. 1 
Pl^ = 1,T 2 = 1}, P T , e i , 6 2 , 6y 9 k , * lf <t» 2 , ♦ , and <t>^) may be 

2 % 
identified. There are 15 independent proportions in the cross-tabulation 

2 * 

table, so that the minimized .X would have four degrees of freedom* In 

principle, a measure of the tenability oiT certain assumptions is obtained 

2 """"" 
from changes in the X . For example, ir it were desired to test the 

hypothesis that and Xg were of equal accuracy, increares in the 

2 

total X" (with two degrees of freedom), resulting from setting 6 = 0 o 
and 4>j = 4> 2 , would be an indicator of the tenability of this hypothesis. 
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Footnotes ^ 

^The research reported herein was performed pursuant to Grant No. 
OEG-2-700033(509) with the United States Department of Health, Education, 

and Welfare and the Office of Education. 
2 

The true scores are not independent of the error scores or errors of 
each other, as is assumed Xn Anderson's (1959) derivations; however, for 
our purposes the assumption that these variables are uncorrelated yields 
the same formulas. 
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j Table 1 

, Possible Events for Three Congeneric Dichotomous Items 



Proportion 
of People 


T 


x i 


X 2 


X 5 


E l 


E 2 


E 3 


.2268 


1 


1 


1 


1 


.3 


■ .1* 


.1- 


.0252 


1 


1 


1 


0 


.3 


.1* 


-.9 


.1512 


1 


1 


0 


1 


.3 


-.6 


.1 


.0168 


1 


1 


0 


0 


.3 


-.6 


-.9 


.0972 


1 


0 


1 


1 


-.7 


.1* 


.1 


.0108 


1 


0 


1 


0 


-.7 


k ... 


- Q 


.061+8 


1 


0 


0 


1 


-.7 


-.6 


.1 


.0072 


1 


0 


0 


0 


-.7 


-.6 


-.9 


.0060 


0 


1 • 


1 


1 


.9' 


.5 


.7 




0 


1 


1 


0 


• .9 ' 


.5 


-.3 


.0060 


0 


1 


0 


1 


.9 


-.5 


.7 


• .oito 


0 


1 


0 


0 


.9 


-.5 


-.3 


.051*0 


b 


0 


1 


1 


-.1 


.5 


.7' 


.1260 


0 


0 


1 


0 


-.1 


.5 


-.3 


.051*0 


0 


0 


0 


1 




-.5 


.7 


. .1260 


0 


0 


0 


0 


-.1 


-.5 


-.3 
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Estimating True Scores and True Group Means 
From Multiple Independent Measures 

Charles E. Werts and Robert L. ; Linn 
Abstract 



• %i Given multiple independent measures of an underlying true factor 

and information on group mombershi]? it is possible to compute a set of 

observed* group means for each measure. Given a- least three tests, 
\ 

these sets of means may be used to compute the reliability of the means 
for each test. The procedure for estimating true scores from the 
reliabilities of the . individual tests and the group means is derived. 
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Estimating True Scores and- True Group Means 
From Multiple Independent Measures 

Charles E. Werts and Robert L. Linn 



The classical approach to estimating true scores given group membership i 
information is to use the formula; - •*] 



A 



T. . - X. + R (X. . - X.) \ (1) -i 

ij j xx ij y ■] 

* - ■.■<* 

a i 

where T . . is the estimated true score, } 
Y. is the observed mean of group j , :^ 
R is the test reliability/ assumed homogeneous 

2CX 

across groups, 

■I 

X. . is the observed score ,f or person i in group j . 



I 

H 

If two parallel tests were available the reliability could be computed as the i 
correlation between tests, however, two sets cf observed individual values and j 



group means would be observed The estimation problem is to use both sets 5 

j 

of data to obtain a better true score estimate than could be obtained from either . | 
The general^ problem of using group information tc estimate true scores 



given multiple measures vill be considered in this paper. 

For illustrative purposes consider the case where congeneric 
measures of an underlying true score factor are available. Congeneric 
measures (X. ) are related to the true score ( T, .): " 



3: 



The research reported herein was performed pursuant to Grant No. 0EG- 
1-6-O61830-O650 Project No. 6-I83O with the Office 6f Education, U. S. 
Department of Health, Education and Welfare. 
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X. = B, T* + M, + e. , 
ljk k xj k ljk 

where X. is theobsery .1 value on person i in group j for 

ljK 

test k , 

m 

T.. i^fHe'^underlying true factor, * (2) 

i th 

B, is the slope of the k test on T.. , 

is the intercept, of the regression line of the k test . 

scores on the true score, 

e ijk Bre error components for individual i on test k with 

zero mean for all levels of T. . . 

- - ij 

Equation (2) shows that congeneric tests may differ in units of measurement, 
reliability and mean, but that they all load on the same underlying factor. 
Three tests is the minimum number needed to solve for the reliability of each 
teat (Lord L Novick, 1968, equation 9.12.i|). The group means may be obtained 
for each test and from equation (2) ; it follows that: 

x Jk = B k ? d + \ Vjk • ' ' (3) 

where X:,. is the observed group mean for group j on test k and 

T. is the group mean on the true score, 
j 

For a given test it 'is useful to derive the condition under which the 
observed group means do not help to estimate the true score. In the prediction 

equation for the true score, = BIX ijk + B,, ^jfc + e ijk 9 the con< *it ion. that 
the group means do net help is- that B n = 0 . By definition: 



B" = 



X,. T X,. T 3^ ^X^. X h 



2 

V V- - C 

X k X k \ X k. 



er|c ■ (f i07 



4 

\ 



where V v is the variance of X. , " 

A^ , 1JK 

is rJtie variance cf X., , 
X k jk . 

("Ls. is the covariance of T. . ^nd X., , 

s 

C mv is the covariance of T. . and X. M , and 
TX k lj 13k 

C v Y is the covarianc'S«^a£-JU- nr ^nd X. . b . 

It follows that B" = 0 implies V Y C - C ^ C Y . Since it can be shown 

\ J *k iA k Vk 
that C^- = C-- and C Y Y = , B» * 0 means that C* Y /V- = C TY /V or 

iX k iX k Vk X k 1A k *k iA k A k 

B TX = B TX ' ^ e * mow ' however., that B- ^ = = therefore: 

ic k k k' 

B TX B X : f = B TX B X T . or R~ $ = R fa 2 

k k V V > wner \T is the- 

k 

reliability of test k . 

In other words for a given test, the observed group mean*; will not improve 
the prediction of the true score when the reliability o2 the means equals the 
reliability of the individual scores for each test. Since it is generally 
found that group means have a higher reliability than the individual scores, 
knowledge of group .aeans can usually be expected to" iniprove the estimation 
of true scores. 

Our general strategy for estimating the true score T. . will be to derive 
expressions* for the correlation of the true score with each set of the observed 
individual test scores and of each set of observed group means. These 
correlations and the set of correlations among the observed variables 
(X. and X., ) then permits us to Solve for the standardized partial 
regression weights for predicting T. . from the 1 observed variables. The 



correlation of X..,. with T_. , (Rj ^) can be derived: 



X Jk ij l A, X; 

ERJC 
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a. From equations and U) it follows that 

C TX k = B k C TT * 

where C^ T is the covariance of T.. and T„ . 

b. Since C.,j.., = (the weighted variance of the means) 
^ B k V f 



= r == r = Hi 

l/ V T V X, \j \ 



u By definition ^ - J v y - ^ J - V f 

therefore: R--- pvT" 



^k_ . J __k_ 



d. 3y substitution 

Oa ) 2 




(it) 



Since the standard deviations of the means ( J V= ) and of the individual 

. *k 

values ( . ; V_ ) can be computed directly from the data, the correlation of 
• *k 

the observed group means for test k with the true score can be computed from 
the reliabilities for the means (R-j 2 ) and the individual* scores (R^)* 
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Since equation (2) is a factor analytic model with one common factor 

T. . and equation (3) a factor model with the common factor f . , the 

^■J ■ . .J 

reliabilities^ correspond to the square of the corresponding (standardized) 

ti 

factor loading. Joreskog (1969£) discusses the factor analysis of congeneric 
measures in considerable detail., With more than three measures the model 
can be tested to see how "Consistent the congeneric assumption is with the 
data. Stronger assumptions ^about the tests (e.g., equivalency) can be 
readily incorporated into the analysis. 

In summaiy then^Mjh^ computational procedure involves: 

1. Calculation of the group means for each of the k tests, 

2. Creation of a new set of k variables by assigning to each 
individual the mean of his group on each of the k tests, 

3- Intercorrelation of all 2 k variables and computation of 
standard deviations. 

U. Factor analysis of the k sets of test scores using as input the 
correlations among those tests from step 3- If Joreskog f s (1969a) confirmatory 
factor analysis procedure is used for this purpose a chi-squared goodness of 
fit measure will $e*obtained along with maximum likelihood estimates of the 
factor loadings (which are squared to obtain reliability estimates for that 
te st ) . 

5. Factor analysis of the k sets of group means using as input 
the correlations among those tests from step 3. If desired, factor score 
estimates of the true group means may be obtained. 

6. The correlations of the k sets of group means with the true 
score can be computed from equation (U) where is the factor loading for 
test k computed in step k 9 is the factor loading for test k group means 



computed in step 5, l/^X^ ±B the stendard deviation of the individual* 

1 no 



scores 
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on test k computed in step (3) and J/ 7 ^ is the standard deviation of the 
group means on test k computed in step (3). 

. 7« The next step is computation of the standardized regression 
weights for predicting the true score from the 2 k observed variables. The 
correlations among the dbserved variables from step (3), the correlations 
of the k sets of test scores with the true score are the factor loadings from 
step (k) 9 and the correlations of the k sets of group means from step (6) 
may be used in the "normal equations" to solve for the desired regression 
weights (Walker & Lev, 1953, pgs. 32U-336). These weights could in turn be 
used to estimate a standardized true score for each individual from his 
observed test scores and group mean on each of the tests. 
Variations 

The above procedure requires that the means for each group on each test 
be computed and that these mean values be assigned to individuals so that a 
set of variables is created which may be inter correlated. The advantage of 
this approach is that the reliabilities of the means may be computed for 
each test and the true group means estimated as factor scores. Instead of 
this analysis a factor analytic model might be postulated to account for all 
the correlations among the 2 k observed (Xy k and X.^) variables. This 
model would have: , 

1. A total of (2k + 2) factors including T , T. , and a 
residual factor for each of the observed 2k variables. 

2. All residual factors involving different tests would be 
assumed independent corresponding to the congeneric assumption, whereas 
each pair of residuals corresponding to the same test data would be 
nonindependent v (because the group means are computed from the individual 
scores for a given test). . 



in 
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3. Each of the X. would load on T. . and each of the X,, would 
load on T. o T. . and T. would be nonindependent, their correlation being 
equal to the true correlation ratio. ' 

1*. -.Because reliabilities are desired the correlation matrix would 
be the basic input data and the variance of T. . and T. would be set equal to. 
unity. 

5. This factor model would have a vector of order 2 k of observed 
scores and a vector of order 2 k + 2 factors and no vector cf unique scores. 
Mien k ;> 3 the model is overidentif ied and Joreskog's (1969c) confirmatory- 
factor analysis program could be used for estimation purposes. The program 
would estimate the factor loadings, the error variances, and error c ©variances 
among nonindependent residuals. 

6. If the analysis were repeated specifying that for each test the 
loading of X. on T. . were equal to the loading of X., on T. , then a test 
of the assumption that for each test the reliability of the means equalled 
that of the individual scores would be the change in che chi- squared with k 
degrees of freedom. 

In the event that it is desired only to improve the estimation of the 
overall true scores using group information a more direct approach may be 
taken by coding the group information as a set of dummy variables (2.). 
The model for this analysis given three congeneric tests is depicted in 
Figure 2: 



/ 
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Figure 2. Estimating true scores with dummy variables. 



- V 



The covariance. between the dummy variables and an observed set of test 
scores will be a function of true mean differences between groups and the 
reliability of the means for that test. In essence, the dummy variables 
add information about the* true group means to the estimation of 1 . 
Since the last dummy variable is perfectly predictable from the other dummy 
variables it may be deleted from the computations. Since the dummy variables 
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in part represent overlapping information about group membership (e.g. , 

a person in one group is not in the next group) the residuals are shown as 

correlated in Figure 2. The factors are now (T. . 9 e. ... . e. , e. . ^ , e n 

-LJ xjl 9 ij2 9 xj3 ' al 

e »2 * > e B (j-i) and the observed vector > x ij2 , \- 3 , a x , a 2 

(j-l). The hypothesized factor loading matrix is: 















B l 


1 


0 


0 


0 ' 


0 0 


B 2 


0 


1 


0 


0 


0 0 


B 3 


0 


0 


1 


0 


0. . . .0 


B b1 


0 


0 


0 


1 


0. . . .0 


B b2 


0 


0 


0 


0 


1 0' 



B(d-i) oo o 



o 1 



i 



J 



The hypothesized variance-oovariance matrix of the factors is: 



i 



1 

0 

0 0 

0 0 

0 0 

0 0 

• . 

0 0 



'ijl 



e. .„ 



Symmetric 



'U3 



0 
0 



0 
0 



V 



el 



C e e 
.l e 2 



e,e 



. e 2 



i c (j-H;i e 2 d (j-i) 
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This approach becomes computationally awkward when the number of groups 
becomes large, in which case it may prove easier to first compute the fe ' 
means and proceed as shown in the previous section. 

In passing, the relationships with a one way analysis of variance with 
a fallible dependent variable might be noted. The problem in that case would 
be whether the true means differed from one treatment group to the next, i.e., 
whether V f ^ 0. In the model used above to test for equal reliabilities this 
would correspond to the hypothes is that/^ f 0 since this correlation is the 

correlation ratio, i.e., - T ° t6St the i ^ >othesiS the 

analysis coula be rerun with^ a "fixed" parameter set - 0 and the 
difference in chi squared values (one degree of freedom) would be the 
appropriate significance test. One might consider using the congeneric 
model for the analysis of variance where the treatment effects are measured 
in terms of several symptoms which presumably reflect some underlying process 
' which is not directly measured. Providing that the errors of measurement between 
symptoms are independent and the symptoms are -linearily related to the under- 
lying process; the congeneric model might provide a more^alid test of the 
hypothesis. 
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ERRORS OF INFERENCE DUE TO ERRORS OF MEASUREMENT 

Robert L. Linn and Charles E. Werts 
i 

Educational Tfesting Service 
Abstract 

Failure to consider errors of measurement when using partial correla- 
tion or analysis of covariance techniques can result in erroneous conclu- 
sions. Certain aspects of this problem are discussed and particular 
attention is given to issues raised in a recent article by Brewer, Campbell, 
and Crano. 
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... ERRORS OF INFERENCE DUE TO ERRORS OF MEASUREMENT ' 
Robert L. Linn and Charles E. Werts 
Educational Testing Service 

Brewer, Campbell, and Crano (1970) have justifiably criticized the use 
of partial correlation procedures in hypothesis testing research where errors 
of measurement are not t^tken into consideration. Ignoring measurement errors 
is much more serious w^en dealing with partial correlations than when dealing, 
with simple zerp^prder correlations. In the latter case we knew that the 
effect of errors of measurement, that Eire mutually uncorrelated and uncorre- 
lated with true "scores, is to reduce the absolute value of the zero-order 
correlation between the fallible measures* As Lord (1965) has pointed out, 
however, we cannot ordinarily know the effect of such errors of measurement 
on a partial correlation. Errors of measurement can increase or decrease 
the magnitude of a partial correlation and may even result in a partial corre- 
lation -of a different sign. 

As an alternative, Brewer et al. (1970) have suggested that factor 
analytic techniques be used to test a single -factor model before drawing 
conclusions about the nature of underlying conceptual variables. The pur- 
pose of the present paper is to reconsider the issues raised by these authors 
and the reasoning that led to their conclusions. Attention also will be 
given to some related arguments that were made in a recent attack on some 
commonly used methods for the evaluation of compensatory educational programs 
(Campbell & ErlebacherJ 1970). Our thesis is that the basic problem is a 
lack of relevant information — a problem that cannot be resolved by the choice 
of a statistical procedure. 
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Relationship between Fa ctor and Partial Correlation Analyses 

Ignoring errors of measurement, the relationship between the loadings on 
a single common factor and the partial correlations in the case of three 
variables is straightforward. The squared factor loadings on a single common 
factor can be expressed; 



a 



P dk CD 



for i,j,k - 1,2,3. J i / j / k / where a ± is the factor loading on the 
single common factor for variable i and the p - s are the interrelations 
among the variables, i,j, k . When p^ = 0 , a* is undefined. Assuming 
none of the three zero-order correlations ..equal zero, the squared factor load- 
ing can be_written as a function of the partial correlation, p 

.ik.i * 



jk.i 

-./here 



3' 

a i = 1 " CP dk.i > (2) 



2 

'ik 



Provided that C xs positive, it may be seen from (2) that when o - O 
2 o j^ 3 - ' 

a i = 1.0 and when p . < 0 , af > 1.0 . 

J A* J. 1 ^ 

Frederic Lord (personal communication) suggested that the relationship - 
between the factor and partial correlation analyses could l>e clarified by an 
examplysuchaa the one depicted in Figure 1. Given x = .50 , the 
possible values of p^ and ^ are contained in the ellipse in Figure 1 
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Regions of the figure that contain negative partial correlations are indicated* 
Factor loadings are denoted by a.^ and regions that contain imaginary load- 
ings or squared loadings greater than 1.0 are indicated. 

Insert Figure 1 about here 



On line segment ao p^- 1 = 0 and a^^ = 1.0 , on line segment bd 

^13 2 = ^ a 2 ~ > an( ^ on ^ ne segments aeb and cfd P^2.3 = ^ anc * 

a^ = 1.0 . Imaginary values of the a f s occur when one of the three zero- 
order correlations is negative while the other two are positive. 

Bias in Partial Correlation 

Brewer et al. (1970) argue that errors of measurement introduce a 
systematic bias into partial correlations. Mor-j specifically, they state: 
11 . . . the assumption is made that the variable being partialled out contains 
no unique components and. is measured without error. Using partialling tech- 
niques when these assumptions are not met introduces systematic_$)i&s toward 
the unparsimonious conclusion that more conceptual factors are involved in a 
phenomenon than may actually be the case" (Brewer et al. ,1970, pp. 1-2). 
Although it is true that -this may be the effect of a violation of the as sump- 
tion of an error free measure, the bias may be in the opposite direction. It 
is easy to construct an example where the direction of the bias is toward a 

\ . ' . 

more parsimonious conclusion that fewer conceptual factors are involved in a 
phenomenon than is actually the case. Suppose, for example, that threV latent 
variables ( , , and ) had the following intercorrelations in the 
population: ^ 




-k. 



Pm m 

A 1 A 2 


•6 , 




V 5 = 


• 6 > 




p v 5 = 


.18 





and 

The correlation between T g and with partialed out is -.2812^ 

and the corresponding conclusion is that more than one conceptual variable 
is involved in this phenomenon. Suppose, however, that only a fallible 
measure of the first variable, say , was available, where 

• - - , x i = T i + E i 

and E 1 is uncorrelated with , T 2 , or . Further, assume that 
the variance of X 1 is equal to twice the variance of (i«e., the 

reliability of X 1 is .50). Under these conditions the resulting intercor- 
relations among T g , , and ^ would be: 

Pjrm = .6y3 = M , 

A 1 X 2 

p n .6 /3 = Mh , 

1 "j 

2 5 

The correlation between T 2 and T^ with X 1 partialed out would be 0.0 
which would result in the more parsimonious, but erroneous conclusion that a 
second conceptual variable is uot required. There is no intention to imply 



hy Jthis-i llustr ation-that-the-bias- ■ of— err ors ^f-measurement -±s^typica±ly , or — 
even frequently, in the direction of producing a partial correlation that is 
closer to zeroi Rather the, point is that the direction of the bias cannot 
be determined without imposing additional assumptions (e.g., all reliabilities 
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and all zero-order and partial correlations among true scores are nonnegative ) 
and/or obtaining additional information such as the reliabilities of the 
measures. Given classical test theory assumptions, an estimate of the partial 
correlation among "underlying true scores may be obtained by simply applying 
standard corrections for attenuation to t;he zero-order correlations. As Lord 
(1963) has noted, the need to make corrections for attenuation poses 
somewhat of a dilemma, since, first, it is often hard to obtain the particular 
kind .of reliability coefficients that are required for making -the appropriate 
correction, and, further, the partial corrected for attenuation may be seri- 
ously affected by sampling errors. These obstacles can hardly justify the 
use of an uncorrected coefficient that may have the wrong sign, however" 
(Lord, I963, p. 36). 

The Single Factor Model vs. Partial Correlations 

As noted above, Brewer et al. (1970 ) have suggested that a single-factor 
model be tested before conclusions are drawn about the nature of underlying 
conceptual variables from partial correlations. We shall argue that partial 
correlation analysers and factor analyses are based on different models and 
pose different questions. Knowing that a single factor can reproduce* the 
intercor relations among three observed fallible variables is not sufficient 
to draw conclusions about the partial correlations among the underlying con- 
ceptual variables or true scores that correspond to the observed scores. 

Assuming that three infallible measures ( T 1 , T 2 , and T^ ) have a 
multivariate normal distribution, the partial correlation between tg andT~ 
T^ with T 1 partialbd out has a very simple interpretation. It is equal 
to the zero-order correlation between Tg and T^ for any subpopulation 



„ , Mmm *m*+ ^ .^rrtrr?t?K*?rr.TV , y. , vy* . 
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defined by a particular value of ^ . Thus, it provides a means of investi- 
gating the relationship between Tg and with held constant in the 
above sense, The 'question of whether or not T n and T-, are related when 
T.^ is held constant is not the same the question answered by a test for 
single factoredness for the observed scores. This is, in principle, acknowl- 
edged by Brewer et al. (1970) in footnote number 3 where they discuss an 
example in which the control variable (i.Q. ) has a factor loading of .43. 
They conclude that "...if one has 'factored out 1 a variable upon which I.Q. 
loads only .43, one has not in any meaningful sense 'factored out I.Q. 111 
(Brewer et al., 1970, p. 7). They go on to indicate that they .are working 
on a technique of "focused factoring," wherein thep^ohtrol variables are 
used to define the factor. Hopefully this procedure would exclude from the 
commonality of- a control variable only that variance that properly might be 
considered error variance. 

If the observed variables (X.^) are related to their underlying true 
scores (T.) by model, 

X. =- T. + E. , i = 1,2,3 , . 

where the errors (E^) are mutually uncorrelated and are uncorrelated with 
the true scores, then (l) may be expressed in terms of the correlations among 

the true scores, p T T , and the reliabilities of the observed 1 measures, 

i 3 - ;'. ■ r ■ 

p.. , i.e., the. variance of T. divided by the variance of X. . Thus 

11 . jw - 5. 

Pm m Pm m 



The correlation between T. and T? k with partialed out is proportional 

to 

P T.T V . " P T.T. P T.T. , 



which, given equation (3), equals: 

p T.T. P T,tJ~T - 1 
i j i k y ^ 

•Considering cases where a single factor reproduces the intercorrelations 

2 

among X 1 , X 2 , and . Xj and 0 < a,, < 1 (i = 1,2,3) , the above expression 



can be seen to have the following implications : 



A. When Pm m arid .p^ m have the same sign, 

X i i ■ Vk 

2 

, 1. a., < p ±i implies p T T ^ > 0 , t 

j k i 

2 - 
'2. a. « p implies p T T T « « 0 , 

A k* i r 

2 

3. a., > p ±i implies P T T - #T> < 0 > 

B. When p^ ~ and p_ _ have opposite signs, 

v T iV T i T k . - A 

,2 ' '■ 

a i < P ii ^P^ 8 P T t .T. < 0 > i 

• ■ i . k i 

2 ' 

. 2. a i = Pn ^Piies Pt.T .T. = 0 • •> •, 

. - j k i 

" ' :■■ 2 • • • "■' • ' ■ " , - ' '• ' '•• 

3. a ± > P ± i implies P T '; T #T . > °. . • 

: ; . ; • J k i . . .. , '-. 

These results show ' that when the correlations among the .observed: scores 
are reproduced by a) single factor with- squared loadings between 0 and 1, no , 



conclusions are warranted regarding the partial correlations among the true 
scores. , Given positive reliabilities and nonzero interc orrelations among 
observed scores, if the three observed variables do not fit the single 
factor model, then the three partial correlations among true scores may be 
positive or negative but not zero. 

The relationship between the observed loadings con 5 a single common factor, 
the partial correlations among observed scores, and the partial correlations 
among. true scores may be clarified by the example depicted in Figure 2. For 
the case = .50 and p n = ^ = p J3 = .50 , Figure 2 shows the pos- • 

sible values of p and . A set of regions is defined within 

15 2 3 

which the factor loadings on a single common factor, the partial' correlations 
among observed scores, and the partial correlations among true scores have 
specified characteristics. The ellipse in Figure 2 contains the values for '' 



Insert Figure .2 about here 



which the determinant of the matrix containing the interc orrelations of T , 
T 2 , and is greater than or equal to zero. Larger values of p T T 

would define a thinner ellipse and smaller values a rounder ellipse. The 
numbers inside the ellipse identify the various regions of the ellipse, and 
the letters identify line segments separating regions. For the regions in '. 
Figured, the factor loadings (a.) for a single common factor that will 
.-reproduce the interc orrelations among the observed scores, the partial correla 
tions among the observed scores '(p^^) /and the partial correlations among 
the true scores ■ (p^^r) ^are - shown in Table 1. ; The . values of a. , p. k ± 



and p for values of T " and P T T | on the boundaries between 

j k' i i j i k 

regions are shown in Table 2, ' 



Insert Tables 1 and 2 about here 




As was stated in implications Ai2 and B.2 above, p T T equals 
2 V ' j k* i 

zero when a. = p. . . This occurs on line segments co, do, io, ^o, cmd 

^and inj . When a^^ = 1 (line segments b£, eo, ho, and ko) the partial^ 

p., . , among observed scores is zero; however, Pm m m is nonzero* 
jk.i j k i 

; location of line boh and line eok depends on the magnitude of and Ip^ : 

boh is defined by points where "p.- = p n p T T p T T and eok is defined by 

■ 2 3 1 2 13 2 - 

points where p_ . = p pp p_ p m _ . . A line where a, = 1 does not exist 

i l i 3 i l 2 i 2 i 3 ' • 

for this example because there are I no possible ..values of Pm' m and Pm m 

-r ' T 1 T 3 2 X 3 

for which p m equals- PxxPm m Pm m . Regions 2a, 3a, k, 6a, 7a, and 8 

r i T 2 00 i i i 3 2 X 3 v , : ; , ; 

are of interest- since they define combinations^ of - ,.'p m and p .for 
' - . i J , i k . 

which a partial correlation for observed scores and a partial correlation for 

true scores have opposite sign£.' Regions 1> 2a, 3a, k, 5, 6a, 7a, and 8 are 
1 If 

where a satisfactory single-factor solution is obtained yet all three correla- 

- - ■ ' v 

tions between pairs of true scores with "the third true score partialed but are 
nonzero. Different conclusions about the number of underlying conceptual 
variables "involved in > the phenomenon presumably would be drawn for instances 
. in those regions. ? • > 

This problem should not be dealt with by simply invoking the principle 

, • . I " ■ :.•>':] : • 'I ■ ' ^ ■} 

of parsimony and thereby .concluding that the fit of a single factor: model 

•', ■ .' . ■ ■•■ ■ ■ • ■ '■' ' 1 \ : ' ■ . • : n 

indicates that there is only' one. -diiggnsion' underlying the phenomenon 1 . Rather s 

v ■■' ■ ■ .' ^f-'-y * 'IS' 

the problem should be dealt with by obtaining: the ..additional information that 
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is necessary to make inferences within a given model. A brief discussion of 
the use of multiple measures to obtain the needed information is presented 
below in the section on needed additional information. 

Errors of Measurement in the Analysis of Covariance 

Campbell and Erlebacher (1970) have provided a much needed criticism 
of the common misuse of the' analysis of covariance as a means of trying to . 
adjust for preexisting; differences between experimental and control groups 
for the evaluation of compensatory education programs. They argue that 
"error" and "uniqueness" in the covariate result in bias when the groups 
differ on the direction of underestimating the slope of the regression of 
the dependent variable, on the covariate (for a good discussion see Cochran, 
^■1968). Porter (I967) has illustrated the nature of the resulting bias for 
'various group differences in means on the covariate and on the dependent 
variable. '"When using the analysis of covariance, bias due to errors of 
measurement in the covariate might niake a compensatory education program look 
bad (or good). 

The effect of "uniqueness" depends on its sources.' If uniqueness is' due 
to errors of validity (e.g. j a perfectly reliable symptom" of the underlying 
variable), then bias will result in the same way. that it . does 'from unreliabil- 
ity. On the other* hand, if, uniqueness merely refers to unshared variance 

. o . - • 

between the covariate and the dependent variable as in Campbell and Erlebacher 

.0 , ■ • * 

(1970) treatment of .covariance adjustments, then the question of bias : is 
ambiguous. Given independent errors, unshared variance, may be. due to unrelia- 
bility, invalidity or a- lack of perfect . correlation, between under lying;, varia- 
bles. -The latter is not a. source of bias and should not be corrected for as 
: ' is done by Campbell and Erlebacher 1 s^a^ 
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This problem needs to be viewed from the perspective of Lord's (19^7) 
paradox. Lord has shown that the comparison of preexisting groups by means 
of an analysis of covariance (statistician 2) and by means of an analysis of 

difference scores (statistician l) can result in paradoxically different 

*- • 

results, both of which are manifestly correct. In his hypothetical illustra- 
tive > example, Lord depicted an experiment in which girls,, received one diet 
and boys another. For each group the mean and variance of the final weight 
was identical to the mean and variance of the initial weight. There were 
preexisting differences between the groups in mean weight, and for each 
group the within-group correlation between initial and final weight was .50. 
Assuming that the weight measures are error free, the above correlation 

would be the correlation between true initial weight and true final weight. 

•j 

In the absence of measurement errors the analysis of mean change would 
indicate no "treatment: 11 effect, whereas the analysis of covariance would, 
indicate a "treatment" effect. 

. Campbell and Erlebacher (1970) have suggested that in pretest -posttest 
designs a "common-factor coefficient" , might be used to correct for errors of 
measurement and uniqueness in the covariate. Using the proper common factor 
coefficients for both pretest and posttest in the standard correction for 
attenuation formula' would result in a "corrected" pretest-posttest correla- 
tion of 1.00. Assuming equal coefficients for the^ pretest and the posttest, > 
the common factor coefficient for Lord's example would be .50. Applying this 
"correction" would increase the slope of the- within-group regression lines, to . 
1.00 and result in identical intercepts for the two groups. In essence, 
Campbell and Erlebacher have devised a roundabout way of siding with Lord 1 s 
first statistician. However,, they have not resolved Lord's paradox. / Rather 1 / 
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than impose a restriction, such as the one that the "corrected" correlation 
•between pretest and posttest be 1.00 (which,, in our opinion, is unjustified), 
-it would seem far better to conclude with Lord (1967) that "... there 

simply is no logical or statistical procedure that can be counted on to make 

proper allowances for uncontrolled preexisting differences between groups" 

(P- 305). 

O ... 

Needed Additional Information for Fallible Measures 

Dealing with fallible measures will generally require additional assump- 
tions -and additional information. In some instances, using' parallel forms 
of one or more' of the measures may provide the needed additional information. 
One difficulty with this procedure is' that most observed measures are really 
symptoms or indirect measures of the variable or influence to be measured, 
which is to. say that even if the symptoms were measured with perfect reliabil- 
ity, they would be imperfectly correlated with the "true" variable. The 
^researcher must decide -hich symptoms are reflections of the relevant under- 
lying variable. This question is crucial since different sets of symptoms 
will typically define different. 4'true" factors depending, on'the particular 
statistical procedure employed. The mult itrait-multimethod approach intro- 
duced by Campbell and Fiske (1959) attempts to deal with this validity problem 
by using different- methods of measuring the same variable. Correlations between 
. , different method measures of the same trait typically will correlate less than 
; equivalent measures, i.e., in this* model the classical psychometric approach 

using parallel, forms is apt to underestimate correlations, among underlying 
. conceptual variables. . An alternative way of stating this problem is to assume 
: "^at pa^ the two measures' X x and X*- of '* is " 



-15- 

due to correlated errors of measurement and that factors causing this correla- 
tion are uncorrelated with the true scores. In this case, the square root of 
the correlation between X 1 and XI* no longer provides a reasonable estimate 
of the correlation between and T . Assuming that the errors are posi- 

r 

tively correlated, the correlation' between X n and X* will overestimate 

J 1 1 

the squared correlation between X x and T ± and using this inflated coeffi- 
cient to correct, for attenuation will result in the kind of undercorrection 
that Brewer et al. ( 1970) warned against. Correlated errors may, in fact, be 
one of the reasons that Brewer et al. wanted to correct for "uniqueness." 
There are advantages, however, to formulating the problem in terms- of corre- 
lated errors rather than simply spying that we should correct for uniqueness. 

The former makes it possible to devise procedures for estimating the ! desired 

i 

coefficient (the correlation between and 5^ ) given the possibility of 

either positively or negatively correlated errors, whereas the latter only 
allows the conclusion that the correlation between X-j^ and X"* overestimate 
the desired coefficient if the errors are in fact positively correlated . 

Conclusion 

From our perspective, "focusing on the conceptual problem of choosing a 
one -factor "vs. a* two-f actor model" (Brewer et al. , 1970, p. 5) distracts the 
researcher's attention from the task of constructing a model which is consis- 
tent with everything we know or hypothesise about the phenomena under" study. 
Any inferences will necessarily be no morel valid than the assumptions made 

about reality. For heuristic purposes we have assumed that the JLinear addi- 

. ■ \ . „■*. 

tive model was relevant; however, there is no rule of nature that effects are 
either linear or additive. . No provision was made, e.g., for catalytic, 
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feedback, or interactional type inf iuences . • It is important for the research 
design to be set up. to study the question of which of the plausible alternative 
models more closely simulates reality; Rather than focus on the conceptual 
problem of choosing a one-factor vs. a two-factor model, it seems to us far 

more worthwhile to spend time in designing the study to explore the relevant 

1 

alternate models, ensuring collection of the. information necessary to test 
which is the best simulation of reality. Depending on the problem, the factor ■ 
" model may be one of the alternatives.. 'Che assumption that the factor model is 
a priori relevant appears to us. to be unjustified given the current state of 
the art. : 

/ 
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Footnotes 



^The research reported herein was performed pursuant to Grant No. 
0EG-2-700033 .(509) with the United States Department of Health, Education, 
and Welfare and the Office of Education. 

? 

Ve are grateful to Frederic M. Lord for suggesting the idea that was 
used for the illustrative example in Figure 1. 




.•..Or-..- 



9 

'ERIC' 



» 1'." "•<* ' ' 



_1 7 . , : 

: •: - v Table 1 -s 7" 

Values of Factor Loadings and Partial Correlations 
for Regions of Figure 2* 











Partial Correlations 


Partial Correlations 




Factor Loadings 


Among Observed Scores 


Among True Scores 


Region 
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a 3 


p 23.1 


p 13.2 


P 12.3 






1 


+ 




+ 


+ 


+ 




+ + 
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2a 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


2b 


a 

G 


+ 


+ 




+ 


+ 


+ 
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3a 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


3b 


+ 


G 


+ 


+ 




+ 




+ 


4 .. 


+ 


+ 


+ 


+ 


+ 


+ 


+ ' + 




5 






+ 






+ 
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6a 






+ 
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6b 


G 
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+ 
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+ - 
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7a 






+ 
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- * + 




, 7b 
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+ 
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+ 
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8 






+ 






+ 






9 


i 
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i 




+ 


+ 
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10 


-- 1 
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+ 




+ 


+ 


+ 



G denotes that the factor loading is greater than 1.0 in absolute value. 
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Table 2 

Values of- Factor Loadings and Partial Correlations 
for Lines^Separating Regions in Figure 1 



l 

w 











Partial 


Correlations 


Partial Correlations 




Factor Loadings 


Among Observed Scores 


Among True Scores 


Line 
















Segment 


a i 




a 3 


p 23.1 


p 15.2 012.3 


P T 2 T 5 .T X P T 1 T 5 ,T 2 




ao 




0 


0 




+ + 


+ * 


+ 


bo 


1 




+ 


0 


+ + 


■ . + 


+ 


CO 




+ 


+ 


+ 


+ + 


0 + 


+ 


do 


+ 




+ 


+ 


+ + 


+ 0 


+ 


eo 


+ 


1 


+ 


+ 


0 + 


+ 


+ 


fo 


. 0 


U 


c 


+ 


- ... + 




+ 


go 


u 


0 


0 


+ 


+ 


* + 


+ 


ho 


-1 




..+ 


0 


+ 


+ 


+ 


io 






+ 




+ 


0 


+ 


do 




"^22 


+ . 




+ 


. - , * 0 


+ 


.ko 




■ -1 






0 + 


+ 


+ 


eo 


0 


U 


0 




+ + 


+ 


+ 


cmd 


+ • 


+ 




+ 






0 


inj • 

■ . . . 










+ 




0 



denotes that the factor loading is undefined. 
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Figure Captions 



.... Fig. 1. Regions which define values of factor loading and partial 

correlations for possible values of q_ Y and pl ■ given a. v = .50 . 

A 1 A 5 " A 2*3 ^1*2 



Fig. 2. Regions which define values of factor loadings and partial 

c l T 5 



correlations for possible values of p and p given p„, • = .50 , 

1,1, igi. i 1 i 2 



^ p ll = p 22 = p ^ = *50 . 
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Identification and Estimation in Path Analysis 
with Unmeasured Variables 

Abstract 



A variety of path models involving unmeasured variables are formulated 
in terms oif Jtfreskog's ( 1970a) general model for the analysis of covariance 
structures . 



Identification and Estimation in Path Analysis 
with Unmeasured Variables* 

A variety of authors (e.g., Blalock, 1969; Costner, 1969; Heise, 1969) 
have applied path analysis to problems involving- multiple indicators of under- 
lying constructs . An important and often algebraically complex feature of 
such analysis is the determination of identifiability of model parameters. 
The purpose of this discussion is to demonstrate how a visual inspection of 
the path diagram can be used to simplify the identification question and how 
these problems may be formulated in Joreskog's (1970a) general model. 

I. A Single Factor Model 

Consider the case of a single underlying factor "* (gF& with three 
observed measures -(X^X^ and X^) .as shown in Figure l.a. The factor 

loadings (p ) in this model equal the standardized path coefficients : 
i 1 I 

(b*,b*, and b*) , given the assumption that the residuals e^e,,, and e, 

• » 1 2 . 5 

are independent of each other and of the factor. It is convenient, though 
not necessary, to assume that both measured and unmeasured variables are 
standardized. For heuristic purposes observed correlations will be designated 
with "r" and expected values of these correlations by "p" . The expected 
correlations will differ from the corresponding observed correlations because 
of sampling and model specification errors . 



-*The- research reported herein f/as performed pursuant to Grant No. 0EG-2- 
700035(509) with the: United States- Department of Healthy Education, and 
Welfare, and the Office of Education. 
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Fig. l.a. A Single Factor Model 



A path analysis of. this model yields the equations : 



and 



p 12 = b*b* , 



P l5 =b ! b 5 'x 
P 25 = b*b* . 



(1) 



Assuming nonzero correlations/ equations (l) yield: 



P 12 P 13 
p 23 



X, F 



11 



{ < b 5> 



p 12 p 23 



'13' 



= p 



p 13 p 2g 
Pl2^ 



X 2 F 1 



= p. 



X F 

Tl 



and 



(2) 



Given only three observed measures the model is just identified, i.e., the 
observed and expected correlations are identical. Wi€h more than three measures 



5 



2 ' 
3 X,F. 



= (b*) c 



i 1 



where i 4 d +'k and 

K 

i- ' ■ ■ • 

141 



(2a) 



9 



assuming p.^ 4 0 . if there were a causal linkage (e.g., ^ -> ^ -> I -> x, ) 
from P x to X i then p x _ would be the product of the intervening path 
coefficients, i.e., the'product of the path coefficients in the chain from 
F x to X i would be identified. If any loading exceeded unity, the model 
would be rejected. When there are m > 3 observed measures then the loadings 
will be overidentified. The number of over identify! ng restrictions is simply 
the number of distinct correlations m(m - l) * 2 less ' the number (m) of 
P X.F. to be estim &ted. Maximum likelihood or least squares estimates for over 

1 1 \ 

identified models can be obtained using Jtfreskog's ( 1970a) general method for 
the analysis of covariance structures. We use path analysis only to study the 

• Of* 

identif lability problem, not fbr estimation purposes- (Hauser & Goldberger, 
1970; Werts, Joreskog, & Linn, in press). 

The above analysis leads to our "rule of three": Whenever the correla- : 
tions among at least three observed variables may be completely ascribed to 
the presence of an underlying factor, then the loadings (correlations ) for 
each observed variable on that factor are identifiable. An important qualifi- 
cation is that the expected correlation between any two observed variables 
cannot be zero since equation (2 a ) would not be defined when that correlation 
was in the denominator . In practice, small\expected correlations may lead to 
unstable parameter estimates, i.e., highly unreliable measures result in 
unreliable parameter estimates . 

II. Generalizations 

The Figure l.a. model with or without intervening, unmeasured variables 
going from Pj. to X i is too limited for most causal analyses. Our purpose 
in this section is to consider other causal patterns which satisfy the jf'rule 
of three," i.e. , in which the. observed correlations among three variables are 



nonzero and may be ascribed to the presence of an underlying factor. Equations 

(l) ; and therefore (2), would still hold if for one of the measures (e.g., ) 

Xj^ -»..£]_ and the residual 0^^ of this regression of on X. were indepen- 

.•■,*' ' - 

dent of the other residuals and - e j > as shovm in Figure l.b. 



1 X 2 



b* 



b* 



0 



■ w 



Figure l.b, 



If two observed measures influence, F^ , e.g. , ^ and X 2 -» F 1 then it * 
is no longer true that the : correlation between these measures equals the product 
of the corresponding path coefficients, e.g., p 12 would not in general equal * 

m • ' • - ■ 

Given that all residuals are independent when there is an intervening 

variable (l^) between X^ and F.^, the correlation, between a pair of observed 

variables X. and X. will equal the product of the intervening path coefficients 

when -X. «— I. <-F. -»X. X. e-I "-»F '-»X. , ' X. -»T- -» F. -> X, ", and 
i I' 1 J i 1 1 1.0 i 1.1 i ' 

X. <t I- F. - -^X . ; ! but. not when f two arrows - point towards-the same- variable, ' 



e.g.;, X^-».I^. «- F 1 -» X jj or X i|~* J i ~* F i *~ X j * In general the correlation 
between two observed variables may be stated .as the product. of the intervening , 
path coefficients whenever, the causal linkage between these variables -does not 
include a variable which is caused by two other variables, i.e., when two 
causal , arrows point towards a variable.. To identify the loadings on a factor 



we need to find three observed variables which,. are causally -linked through 
that factor, the linkages satisfying the above criteria. 

III. Examples 

A. Our first example, which corresponds to Figure 1 in Wiley and Wiley 
(1970), is shown in Figure 2. a. 




Figure 2. a. 



Tracing linkages for F 2 : 



X x *- F x -> F 2 -* F^ -*X 5 _,J- 



X l ^ ^ F 2 



X 0 y and 



F 2 ^ F 3 ^^3. * 



Since these three linkages all include F 2 and satisfy the requirements of the 

"rule of three 11 we may conclude that the" 1 factor loadings (a, _ ) , i*e.,. the. 

^i F 2 

correlations of each observed variable with F 2 are identified. • Thus, 



PXgF 2 = b i ' ' and ' 



(3) 



^2 



The factor loadings on F are not identified because the correlation between 
X 2 and X^ cannot be completely ascribed to F.^^ . Likewise the loadings on 
F^ are not identified because the correlation between X^ and Xg cannot 
be ascribed to . Jtfreskog (1970b) shows that this model may be estimated 
by a single factor model with F^ as the common factor and that the example 
may be generalized to more than three measured variables. 

B. Our second example (see Figure 2.b) corresponds to Figure k in Costner 
(1969). The analysis is identical, whether F 1 -» Fg or F 2 . 



\ 






















! . 


X 2. 


> 


> 


% 



Figure 2.b. 
Tracing linkages: 



x i. < - p i^% 1 > 



x r F i - F 2 '-*H\ > 



X 2 -F r ^.F 2 .^.X 5 , _._ 



X 2 *~ F l ~* F 2 ' ""* X l+ ' 811(1 



X 3 *~ F 2 ~* X k 



" ||(^c) 

7* (^5 



For f'j the factor loadings may be identified by linkages W,b,d or by 
U,c,e, i.e., these loadings are over identified and 











p x 5 f x ■ b S b ! ' 0114 









The factor loadings for F 2 may be identified by| «rt>,c,f or >d,.,f 



and: 



P Xl F 2 = b l b 5 ' 

•Px 5 F 2 = b ? ' and ■ ' - :i 

Since b. and b*; are identify, b| la also tdontiiiia by th.se equations 
The analysis may be complicated by assuming e, correlated with , 

in which case linkage to wouhi not .be valid, however the conditions or the 
"rule of three" would still be satisfied for F, and F 2 , and all path 
coefficients and corrections between errors are (Just) identified. Such a 
m0 del would correspond to.Figure 5.a. ia Oostner (1969). • 

C The next example, corresponding to Figure 1 in BUM Of?), as 
shewn in Figure 2.e. ^ ^ is basically a variation on the model of 
■ Figure . l.b. . . \ : •■:•/ '• ' < • . '■ '■■ * 
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Figure 2.c. 

This^mpdel differs from that in Figure 2. a. in that X.^ -» . instead of 



F^.-^X^j^ . The linkages are: 



X l -» P l F 2 -* X 2 ' 



X x -» F 1 -» F 2 -» F^ -» , and 



Since Fg is in all three linkages which satisfy the "rule of three," the 
factor loadings for Fg are identified and 



p = b*bi* 
TCjjFg . 1 £ 



3 r 2 



I r 12 r i3 * 


r 23. 


r l2 r 23 * 


r l3 


f r i3 r 23 ;* 


r l2 



(5a) 
(5b) 
(5c) 



Since cannot -be ascribed to F^ and r^^ cannot be ascribed to ; F^ , 

the loadings on these factors, are not identified. Our heuristic device .would 
halve been helpful. -to'Blalock (1963) since he obtained the equations correspond 
ing to the linkages shown above, bat did riot solve them, for the equivalent of 
equations 5a,,b,> and c. % ■ . . . 



D. Our fourth example shown in Figure 2.(1., corresponds to Figure 2 in 
Blalock (1963). 




0 = residual of Fg on X^ and 



Figure 2.d. 

Tracing linkages which satisfy our rule: 

X l ~*- F l -> X 2 ' 

X L F L -> F 2 ->Xj , 

Xg <- ? x -> Fg -> Xj . , and j 



In this model it is assumed that X^ is independent of ^ X^^ and Xg 
loadings on F 1 are identified by linkages 6a,b and c and therefore; 



P = b* > 

x i F i \ 



p = b*b* 



(6a) 
(6b). 
(6c) 
(6d) 

The 

(7a) 
(7b) 
(7c) 



It is not possible to find three observed variables whose linkages satisfy 
our rule for Fg , i.e., the linkage between X^^ and X^ haS two arrows 



pointing at F 0 and the linkage between X-. and X 0 does not include P ' / 
d. 12 2 . 

Since = b*b£ it follows from equation (7c) that P x F b £ = p 34 b | ' ' 

E. The fifth example, sho-m in Figure 2.e. ; has the special feature 
of two observed nonindependent variables influencing an unobserved variable. t 
It corresponds to Figure k in Blalock (1969). f J 




— e» 



9 = residual of F^ on and Xg 

■'regression* 



Figure 2.e. 



When Xg is deleted X 1 , X^ , and X^ f orm t^ie model in Figure l.b. from 

which we conclude that the correlations of X, , X, , and X> with T, are 

; 1 7 2 4 1 

identified. Similarly when X, is. deleted the correlations of X~ • X, > and - 
X^ with F^ are identified. Given the correlations among X 1 , Xg > and F^ 
the path coefficients b* and* b* may be identified since: »^;- 



P X i F 1 " P 12^2 F i 
1 "^2 



arid 



Vi ' Pi2Px i F i 



1 - p 



12 
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F. Our last example, shown in .Figure; 2,f. corresponds to Figure 9«b« in 




Figure* 2. f; 

■ From the analysis of the Figure 2. ft. model we may deduce that when X^ is 

excluded that Ab* ; b£ ; b| ; b* , bg , and b* are identified. Using the 

: " variables VX^/J , and X^ we know from our analysis of the Figure 1 

: model that the. correlation of Xt with F n (p„ „ ) is identified and 
p- , - 1 4 1 x X^F.. 9 

similarly -using X^ , Xj- , and Xg we know that the correlation of X^ 

with EL (p v ) is identified. Since the correlations among F^ , F n 
d X^Fg 12 

and X^ are identified it follows that the path coefficients b£ and . bg 
which are' l $unctions v bf these correlations , are identified. As compared to 
Costner 1 s -(1969), rather complex algebraic analysis of this problem, it may 
be seen that we fere satisfied in merely knowing that the model parameters 
are identified.* ■' 

IV. Estimation. 

j8reskog's (1970a) general model for the analysis of covariance 
structures can be used to estimate the~ parameters for the models discussed 
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above, Werts, Jttreskog and Linn (in press) discuss the use of Jttreskog's 
model from the perspective of -path analysis. Use of the associated com- 
puter program (jttreskog, Gruvaeus, & van Thillo, 1970) for the present 
purposes requires the investigator to specify a matrix A corresponding 
to the factor loadings in factor analysis; a matrix $ which is the variance 
covariance matrix of the unmeasured factors, and a matrix 0 of residual 
variances. The matrices B and Y in Jilreskog's formula -are taken as the 
identity and zero matrix respectively. 

Consider for example the model in Figure 1 in which 



A = 



<b = [1] 



and 



8 . = ■ 



V 0 0 



0 V 0 
e 2 

0 0 V 



Define: X = column vector of standardized observed variables, 
F = column vector of factors/ and 
e = column vector of residuals. 

In matrix terminology: 



X = AF + e 



(8) 



Equation (8) is shorthand for the path equations (all variables standardize 
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x l = 


b ! F i 


+ e i 


> 




b ! F i 


+ e 2 


, and 


X 3 = 


b ! F i 


+ e 3 




It can be 


seen 


that 


A is the matrix of the coefficients of- 



parameters in the matrices specifying the model structure in JiJreskog's 

model are of three kinds: (l) fixed parameters that have been assigned 

given values; (2) constrained parameters that are unknown but equal to 

one or more other parameters; and (3) free parameters that are unknown and 

not constrained to be equal to any other parameter. In the above example 

the unity in <D is a fixed parameter, whereas the b* in A and the V 

i e i 

in 6 are free parameters . 

The expected variance -covariance matrix £ for this problem is: 

where the 1 in <& is the variance of F 1 , for convenience standardized 

2 

(i.e., equal to unity) and 8 is a diagonal matrix whose elements are the 
error variances (V ) . Equation (9) should be recognized as a shorthand way 
of expressing all the path equations relating expected model correlations to 
model' parameters, i.e., 





1 


• P 12 


p l3 


(where unities indicate observed 


E = 


• P 12 


1 


p 23 


> 

variables were standardized). 




P 13 


p 23 


. 1 





Equation (9) states: 



•Ik. 



1 = 




+ \ 


1 = 


(b|) 2 


*\ 


1 = 




+ V 


P 12 = 


b*b* 




PL3 = 


b*b* 


y and 


P 23 


b*b| 


• 



This short description for a single mcxlel contrasts with the path analysis 
approach to estimation used by Costner (1969) and Blalock (1969) in the 
following respects: 1> 

(a) The matrix L of expected correlations between observed variables 
will differ from the actually observed matrix, of correlations because of 
sampling and/or model specification errors. Thus we do not use observed cor- 
relations in our equations as in the usual path analysis approach. Instead, 
Jflreskog's program attempts to minimize the difference" 7 between observed and 
expected variance-covariance matrices using either a least squares or maximum 
likeli*od approach. In large samples, assuming that observed variables 
are distributed normally, a chi square statistic is produced which measures 
the overall fit of the model to the data. Another way of gauging fit is to 
compare the differences between the observed and expected correlations gen- 

i' o ' ■ • C 

erated by 'the model. 

(b)« The degrees of freedom (df) for the X measure are equal to the 
number of overidentifying' restrictions. In path analysis this corresponds 



to the number of different ways the path equations may be solved for each 

parameter. To compute the df it is necessary to count the number of distinct 

elements in E (i.e., m(m+'l) ± 2) and subtract the number of parameters 

to be estimated (e.g., b*,b*,b$,V ,V , and V ) . There is no need to 

1 2 J e x e 2 e 5 ( . 

solve the path equations in Jttreskog's approach, although the identifiability 

• • $ 
must be known, j 

To- analyze the model in Figure l.b., we merely need to note that when 

X^ and are standardized the regression of X-^ on Equals that of 

F^ on X^ and the residuals are identical. Thus we may use the same 

estimatifsi procedure for this model as for that in Figure l.a. (wheje 0^ = e^ 

Likewise the models in Figures 2. a. and 2.c. may be estimated by ignoring 

|- ■ 

and F , and treating X ,X , and X^ as indicators of the common factor F 
The model in Figure 2.b. with the added feature of e^^ and e^ corre- 
lated requires special treatment. The equations are: 



x l 


= b ! F i 


+ 


e i » 


X 2 


= b Fi 


+ 


e 2 , 




= b*F 2 


+ 


6 5 ' 






+ 


e^ , and 


F 2 


= b ! F i 


+ 


e 2 . 



We know that b* f . is equal to the correlation between F 1 and F g so 

I 

there is no need to replace "Fg- ^hy and e g itt the first four equations 

To specify a correlation between e^ and e^ ~, all residuals must be treated 
as factors, i.e., F f = (F-^F^e^e^e^e^) . Tffe structure is: 



A = 



and 



* = 





n 


e l 


o 
u 


u 




n 
u 


b* 


0 


0 


b* 


0 




0 


d. 






e 2 








0, 


b. 


0 


0 


b* 
e 


X 

J 


0 


o 


k 


o 


0 


0 




b* 


1 




b f 


0 


0 


0 




b 5 




1 


0 


0 


0 




0 




0 


I 


0 


p e. 


1^3- 


0 




0 


0 , 


1 


0 




0 




0 


p e e 
l e 3 


0 

** 


1 




0 




0 


0 


0 , 


0 





0 



V I 



In contrast to previous, formulations the error variances are standardized 



so that the correlations between e^* and and F^ and Fg are 

estimatedj directly and-^in A the. path coefficients of the observed varia- 



bles on their errors 



(bjjj ) are estimated. This model has 10 distinct 

""" ^ i- 

elements in £ and 10 parameters to be estimated (b*,b*,b*,bj*,b* ,b* , 

a c ^ *f e^ eg 

b* >b* ,b*,p : ) i.e., the model is just identified. The expected 

e 3 % 3 e l e 3 . . , 
variance-covariance matrix 2 = Aj>AV 9 i.e. > the matrix 9 is taken to be 



zero. 



The Figure 2 . d . ^modelposes two pr obleffiBT»^4iLeparamet ers bf 9 bft . and 
are hot identified and the expected correlation between X^ and or ) 

Xp is specified as zero even though the observed correlation may differ " 



from zero presumably because of sampling fluctuations. The analysis in 



Section II showed that X^ does not contribute to the identification 

of pareuneters, i.e., only the product b^b* is identified with or without 

X^ . Without X^ the model is that of Figure l.b. and no purpose is 

served by retaining F 2 • Assuming all variables are standardized 

X l = b l F l + 0 1 my ' be substituted for F 1 = b*K 1 -+ e 1 as noted earlier. 

.With F 2 eliminated and knowing that only -the correlation of X^ with 



F^ is identified the model may be written as : 



x 1 = b*F 1+ e 1 , 



X 2 = b 2 F l * 6 2 ' 

X, = b*b*F, + b*bfX, + e; . where el = b*0 + e, 
3 551 y h k- 3 3 3 3 



(10a) 
(10b) 
(10c) 



For convenience define b*^ = b*b* and b*^ = b*bj* . For computational 
simplicity define a new factor x^ which is identical to the observed X^. , 
i.e., X^ = . The f acL jrs are then F' = (F^x^) , 



A = 



"1 


0 


bg 


0 






0 


1 
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e 2 = 



o 
v. 



o 
o 



In the fourth row of 0 the diagonal cell is zero to indicate the identity 

i 
J 

Xj^ = x^ without residuals. If the expected matrix Z is computed, i.e., 
Z = M>A f + Q 2 W e find: 



Z = 



v x 

1 


b*b* 


b*b* 5 


0 




V X 

2 


b*b% 


0 




- b ^5 


V H 

5 




0 


• 0 







This^hows that the expected correlations of X^ and X^ with X^ are 
zero. This follows from the specification in $ that x^ is uncorrelated 



with F 



1 • 



Jjx_the_aimlys.is^ ?.e., the correlations among 



X l 9 X 2 } ,and : F l were ^^?*^ f, ^ ed first and then b* and b* identified 

from these correlations. The simplest estimation procedure is to estimate 

■ & * . ' ' 

the correlations among X , X , and F • and then compute b* and b* 

.1 ■ 1 ,1 2 . 

from the estimated correlations'* This problem can be handled by defining 
two factors, x^ = and/x 2 = X g . The structural equations axe: 



■ft'-' 



•1 

-•i/r. 



I 

i 
H 
% 



9 

ERIC 



03 



* = 



X 1 X 2 



Px l X 2 ' X 2 



Px F P x P 
Tl Vl 



and 



9T = 



0 
0 
0 
0 



0 
0 
0 
0 



x,F. 



11 



Vl 



0 0 

0 0 

v e 0, 

o V, 



I 



There are 10 distinct elements in S and nine parameters to be estionated 

* ' • . . '' 
(bt,b£,V v .,V v ,P V V ,PV >,P„ ,V > and V )> so that the model has one 
2 ^ ; x j_ x 2 . x i 2 11 2 1 5 4 . / 
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... i 



I 



Vt 1 


• x i 


= x l ' 




x 2 


= X 2 ' 


j - 




= b*F 1 + e^ ' , and 


■' 


• x 4 
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The factors are F\= (x n , x , F- ) , 

i o o 



A = 



0 
0 
0 



0 b* 

5 



1 3 



ERIC 
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overidentifying restriction. Note that the estimated elements of 4 should 

be used to estimate b* and b* (r Y Y may not equal p ) . 

1 d A i A 2 12 

In relation to the model in Figure 2.f. Costner (1969 ) discussed the 

problem of ascertaining whether bg was zero and of distinguishing the 

bg = 0 model from one in which errors (e.g., e^ and e^ ) were correlated. 

To see how this is accomplished in Jtireskog's approach, first consider the 

model when bg = 0 and -treating residuals as factors : 

X f = (X^,Xg >Xj,X^,Xc-,Xg) 9 

F' = (F 1 ,F 2 ,e 1 ,e 2 /e 5 ,e lf ,e 5 ,e 6 ) , 



A = 



4> = 
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» 

and 

Z = ADA' (i.e., e 2 = 0 ). 

Note that we have chosen not to introduce the residual 0 into the analysis 

because we wish to standardize both F. and F 0 , in which case p w ^ - = b* . 

1 2 ' 12 7 

This model is a variation, of that in Figure 2.b. and all parameters are 

identified. 1 There are 21 distinct elements in £ and I** parameters to be 

estimated so that there are seven overidentifying restrictions. To test 

bg i 0 ,* we specify = b^ + bjFg + , i.e., in A the fourth row, 

first column element is left "free" instead of fixed = zero. This model 

has one more parameter to be estimated and therefore six overidentifying 

restrictions. Thus the original model is more restrictive and will there- 

2 

fore typically have a larger X . In large samples, the difference in 
2 

X between these two models, with degrees of freedom equal to the difference 
in number of restrictions, can be us.ed to test the hypothesis that 



and" e^ correlated 
would have six degrees 



on 



bg / 0 . Similarly the model with e^ 

("free") in * instead of independent (fixed = 0), : 

of freedom and the dif ferencein _ _X_^ with one degree of freedom would be 

a test of the hypothesis that e^ and e^ are uncorrelated. A comparis 

2 ' 
of the X* for , bg / 0 to that for p g £ 0 gives an indication of 

3 k : 

which is the better fitting model. Costner (1969, Figure 10) also raises 
the question of whether e and e p are correlated. This hypothesis is 

' ■ (.:■ ■ 

tested by allowing the covariance between e 1 and e 2 in * to be "free," 

2 « • 

the change in .X with one degree of freedom providing the appropriate 

statistical test. Hypotheses involving "constrained" parameters may be 

tested similarly, e.g., b* =. b# ~ (Heisej/1969) or V ' = V (Wiley & 

P t * ,* / ^1 

Wiley, 1970). ' 1 r // 



It pan be observed that use of jtfreskog's program requires the investi- 
gator to know the identification statu^^f each parameter, but dbes not 

require the complex algebraic manipulations provided by Cos£ner (1969) 

I 

and Blalock (1969). It is important^ to recognizd the essentials of each 

»• 

model in order to fit it into Jifreskog's general model. Jifreskog's model 
assumes! that the observed variables are "random" rather than "fixed" but. 
it is doubtful that most applied ( sociologists need to be concerned about 
this issue which is minor in comparison to the usual questionable validity 
of measures and models. 
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Intraclass Reliability Estimates; Testing Structural Assumptions 
Werts, C. E., Linn, R. L*. , and Joreskog, K. G.~ 

Abstract 

Intraclass correlation reliability estimates are based on the 
assumption that the various measures are equivalent. Joreskog's (1970) general 
model for the analysis of covariance structures can'Se used to test the 
validity of this assumption. 
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Intraclass Reliability Estimates: Testing Structural-Assumptions 
Werts, C. E. , Linn, R. L,, and Joreskog, K. G. 

The validity of using intraclass correlation to estimate reliability 
is dependent on a variety of assumptions (Winer, 1962, Chapter 4; Cronbach, 
Rajaratnam, &-Gleser, 1963; Stanley, 1971,. PPS. 420-429), This paper will focus 
on testing the assumption that the various measures are "equivalgn^" j>x 
"parallel" (Lord & Novick, 1968, pg* 48). 

Joreskog 1 s (1970) general model for the analysis of covariances 
structures will be used for this purpose. Some implications for ■ 
generalizability theory (Cronbach, Rajaratnam, & Gleser^ 1963; Rajaratnam, 
Cronbach, & Gleser, 1965; Gleser, Cronbach, & Rajaratnam, 1965) will be 
considered. 

I. Jcfraskog's General Model for the Analysis of Covariance Structures 
Quoting Joreskog, van Thillo, & Gruvaeus (1971, pg. 2-3): 
"The general model considers a data matrix X(N x p) of N observations 

on p variates and assumes, that the rows of X are independently distributed, 

each having a multivariate normal distribution with the same^variance- 

covariance matrix E . It is assumed that 

e(X) = A5P , (1) 
where A(N x g) v =-(a ) and P(h x p) - (p ) are known matrices of ranks 
g and h , respectively, g < N, h <^ p and 5(g x h) = (S g |) is a matrix 

of parameters; and that E_ has the form 

2 2 

- S » B(A*A' + * )B f +'© v (2) 

_ - •: I 

The research reported herein was performed pursuant to Grant No. 
OEG-2-7000 33(509) with the United States Department of Health, Education, 
and Welfare and the Of f ice of Education. ! : 



where the matrices B(p x q) = .(0 ik )> Mq x r) = (Xj^) , the symmetric 

matrix $(r x r) (A - ) and the diagonal matrices \J/(q x q) = ( 5 t,-i\) 

mn - r 

and ©(p x p) = (fi'/.G,)- are parameter matrices. 

* H. • . .. 

Thus .the general model is one where means, variances and covariances 
are structured in terms of other- sets of parameters that are to be estimated. 
In any application of this, model, p , N and . X will be given by the data, 
and ~g , . ; h , q , r A and P will be given by the particular applica- 
tion. In any such application we shall* allow for any one of the parameters 
in £ , B , A , $ , and Q to be known a priori and for one or more 
subsets of the remaining parameters to have identical but unknown values. 
•Thus parameters ^are of three kinds: (i) fixed parameters that have been 
assigned given values, (ii) constrained parameters that are unknown but 

" • r 

equal to one) or more other parameters and (iii) free parameters that are 

v unkftown and not constrained: to be equal to any other parameter. 

vi V *, Th'e cc^^ter--prp^ v fam^&timates the free and constrained parameters of 

'anyVudh model \by the maximum likelihood method and provides a test of good- 
ie 

ness of fit of the whole model against the general alternative that P is 
square and H and E are unconstrained. A test of a specified model 
(hypothesis) may be obtained, in large samples, by computing the maximum 
likelihood solution under the two models and then setting up the likelihood 

ratio test (see 1.5). In the special case when bdth H and E are 

1 x 

unconstrained, one may test a sequence of hypotheses of the form 

CHD = o} : jl (3) 
where C(s x g) and D(h x t) are given matrices of ranks s and t , 
respectively." 
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II. Application 1 J 

For illustrative purposes consider the situation' in which four 
alternate forms (ratings, etc.) of a test are administered to the same 
people; the testing conditions being such as to justify the assumption that 
the person's scores on the alternate forms are experimentally 

i 

independent. In Cronbach's terminology the facet 

under consideration is alternate forms and jthere are four condition^ of 
this facet under which each person is observed. The data would be analyzed 

with a (two-way analysis of variance (ANOVA) model in which each row 

,J -** ■ 

corresponds to the scores for a given person and each column to a different 
measure as shown in Table 1. 





Alternate Forms 






Person 


x l 


X 2 


; x 3 




Total 


Mean 


1 




X 12 


X 13 


x m* 


, P l 


Pl 


2 


X 21 


X 22 


x 2 3 


x 2if 




P 2 


N 


^1 


*N2 


*N3 






? N 


Total 


Tl 


T 2 


.... T 3 ! 




G 


■y 


Mean 


Tl 


T 2 


T 3 " 






G 



r Table 1 

a 

From this table the mean squares between people (MS^) . , mean squares 
within people (MS^) and residual mean squares. (MS^ can be computed as 

shown in Winer (1962, Chapter 4) . Following Cronbach, et al. , (1963), the 

• !' -' " : th ' 

reliability {(*) of the i measure and the reliability (/°) of a 
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composite measure may be estimated as (p = # of measures) : 



MS - MS 

b . r 



and 



MS, 



b + (p - 1) MS x 



(4) 



\ 



-\ 
( c 



MS. - MS 
b r 

MS, 



(5) 



These formulae do not assume that the expected value of the test means are equal; 
however if the expected value (y) of the test means is constant (i.e., 
observed mean differences due 1 to sampling error) then it would be 
appropriate to use: 



r i 



MS, - MS 
b w 

MS, + (p - 1)MS 

D W 



and 



(6) 



MS U - MS 
b w 

■' MS, 



(7) 



To test assumptions using Joreskog's method we can start with a model 
in which the test means are assumed to dif fer and all measures have the same 
underlying true score, J ,. *In terms of equations (1) and (2) this corresponds 
to a single factor CT) ./model where the observed vector is 

X = (Xi , X 2 , X3 , x^) , 



A = 



-if". 



_ 5 _ 



A = 



bi 

b 2 • 



. * « [ 1 ]. 


9 








V s ■' 
. ex 


0 


0 


0 


, 2 


0 

l : 


V 


0 


0 




0 , 


0 


V f 
e 3 


0 

1 

.1 




0 


0 


0 


■"' i 
v • j 



(h) is a null matrix and B an identity matrix. 
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In this formulation the means in S , the factor loadings in A and the 

2 \ 

error variances in are free parameters to be estimated. For convenience 

the variance of the true scores V has been standardized (i.e., V =1 

in Since there are 10 distinct elements in E (i.e., p(p + 1) * 2) 

2 1 
and only eight parameters in A and ^ • to be estimated (i.e., b^ , b 2 , 

bo , bii . V , V , V . and V ), this model has two overidentifying 
e 2 e 3 e^ 

restrictions (degrees of freedom). When the maximum likelihood estimation 
procedure ^is used, Joreskog's program (Joreskog, van Thillo, Gruvaeus, 1971) 
yields a chi square measure which, in large samples and assuming multivariate 
normality of observed variables, is a measure of the fit of the model to 
the data N In the illustration this x with 2 degrees of freedom may 
be used to test the assumption that the four measures have a common true 
score If this hypothesis is rejected then the exact meaning of a 

reliability estimate is in doubt. Perhaps there is not a single underlying 
true factor and/or the error independence assumptions are violated. If the 
single factor. model is not rejected then reliabilities may be obtained 



from parameter estimates, i.e.: 

r i .0 



and 



/v2 A 

b 4 + V. 
i e> 
i 



(8) 




(9) 



Given a\minimum of three measures the are identified given only . the 

assumption of single f actoredness • With two measures it is necessary to make 

■ . ■ ■ * 

additional assumptions (e.g., equal b^) for identification. 

The intraclass correlation and generalizability theory procedures 

assume that the measures all have the same units of measurement, i.e., are 

ess en tiaily tau. equivalent" (Lord & Novick, 1968, pg. 50). It 

would not ie meaningful to average,^ cores from measures with different 

units as is done in Table 1 to obtain person .jneans. In Joreskog's method 

equal units are equivalent to the assumption . that that the regression weights 

b are equal, i.e. , b x = b 2 = b 3 = b k i = b in A. Therefore , the next step 

; f 1 r j ' 

in the analysis with Joreskog's program is to constrain the parameters in • 

■■ 2 

A to be equal, obtaining a new x estimate of the fit of model to the data 
2 

The x will have three additional degrees of freedom because of this 

2 

constraint. The increase is x f r ° m the previous step (where single 
factoredness was tested) with three degrees of freedom, tests the hypothesis 
that the^nits of measurement are equal. If this hypothesis is rejected 
then the * ANOVA formulation is rejected whether used for estimating 

reliability or for generalizability procedures. _If the hypothesis of equal 

f , • . , . 

units is not rejected then the parameter estimates. may, be used to estimate 

reliability as follows (p.= # measures): 

.-2 

£ = ^- . and . - (10) . 

I 1 .2 »' ■ 
b + V„ 
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✓s (pb) 



(11) 



* 2 P A 
(Pb) + I V 

i=l i 



An exactly equivalent formulation is obtained if we fix all b ± in A equal 

... a *2 
to unity, allowing V^- to be free, in which case V~ will replace b in 

equations (10) and (11). 

The reliability of any single measure from equation (10) may vary 

because of differing error variances whereas . equations (4) and (6) imply .that 

all measures have the same reliability. It follows that it is necessary to test 



whether the error variances are indeed equal, i.e., V 



The third 



step' (in addition to previous constraints) in the analysis is to constrain 



2 , 



the error variances in V v to be equal, i.e. , V ■ V 

e| e^ 



V = V = V . 

' e.3 e^ e 

2 t_ 

This will add three degrees of freedom and the increase in x from the 
second step tests the hypothesis of equal error variances. If this hypothesis 
is rejected then it may be asserted that equations (5) and (7) 
underestimate the composite reliability. If this hypothesis is not 
rejected then reliability estimates may be obtained directly from 



parameter estimates: 



* . b 
1*2 



(12) and 



b + V 



* (Pb) 
P c = — 



c « 2 

(pb) + pV 



(13) 



ERIC 



The estimates from ^quations, (12) and (13) xarry the same assumptions as 
equations (4) ox^JiS) respectively, however different estimates may result 
because (12) and (13) are estimated under structural specifications which, 
are assumed for (4) and (5) , but not constrained to follow. Nonetheless 
equations (4) and (5) would in principle tfi^ppropriate in this situation. 



If the expected variance-covariance matrix (2) is examined it will be seen 
that the expected variance (diagonal of Z) for the different measures are 
equal as are the expected covariances between measures (off diagonal elements 
of E) . This is precisely the configuration assumed in "the .ANOVA procedure 
when used for testing treatment (between measure) effects (Winer, 1963, pg. 
124). Joreskog's method may also be used to test these "treatment" effects, 
i.e., whether the .test means differ. To do this we would make the additional 

constraints that the elements in 5 be equal, i.e., Ml ~ V*2 = V*3 = V** = V • 

2 j 
The resulting increase in x with three degrees of freedom can be used 

to test the hypothesis of equal means. If this hypothesis is rejected 
then equations (4) and (5) are more appropriate than (6) or. (7) If the 
hypothesis is not rejected equations (12) and\ (13) would still be appropriate, 
however the parameter estimatgs will generally differ because of the 

restriction on the. .means. ... . 

Overall it may be observed that the above fLur analytical steps 

test the' several aspects of the' Hypothesis' that the different measures 
are "equivalent." If the hypotheses from each of the four steps are not rejecte 
the implication is that observedj differences in means, variances, and 

"covariances between tests, ; are ascribable to sampling error. 
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III. 



Discussion - 
In essence, equation (9)° estimates the reliability of a composite of 

the measures included ' in the study given tbe assumption of a common 

2 > ' , ■ 

underlying true score. The x measure of fit associated with this 

specification is a. test of the validity of this assumption. In contrast^ 
the intracla'ss estimate of composite reliability assumes 
equivalent measures (implying a single true factor). . From a 
structural perspective, the intraclass reliability estimate is therefore 
'of limited applicability and even, when measures are equivalent does not 
provide population estimates which- necessarily are constrained to be 
consistent with this assumption. Furthermore, the intraclass estimate is 
inappropiate when errors of measurement are nonindependent , e.g., if the 
measures were ratings and a single judge did two of the ratings, 
the errors for these two measures would probably not be experimentally 
independent due to halo effects. In this situation a single -factor would not 

account for the covariances among measures .- Using- Joreskog's method a. .. 

model, could be used which would- allow for the appropriate pair of errors to 
be correlated (Werts S Linn, in press). In this case, application of equation 
(9) would estimate the squared correlation of the composite score to the true 
score, whereas equation (5). would yield, meaningless results. Given. matched 
. (all persons take all measures) data, certain aspects of general! zability 
theory, may be considered in light of the mbdel developed in section II. In 
-particular, Cronbach, et al. , (1963). require the investigator to specify 
a universe of conditions of observation over , 



which he wishes to generalize. The example in section II corresponds to 

a single facet design and an investigator might for example specify conditions 

i = 1,2 as the universe appropriate to his particular study. In 

our approach, equation (8) would provide the reliability estimates 

for individual measures and/in equation (9) sums would be taken over t = 1,2 

to provide^ the /composite reliability for this particular universe. If we wished to 

assume (perhaps because of a x test) that the measures have the same 

units of measurement (as does generaliz ability 

theory), then equations (10) and (11) would apply . Generalizability theory 
is clearly superior to intraclass correlation procedures in not requiring, 
equivalent measures, but is no? as flexible as Joreskog's approach 
because of the equal units assumption. Cronbach, et al. , (1963) indicate 
that the observed scores are determined by the person's universe (i.e^, 
"true") score defined as the firkt centroid factors of thecovariances 
between conditions in the universe, other centroid factors required to 
account for covariances between conditions, arid residual variance after 
removal of the factors. The. variance of the observed scores for a particular 
measiire equals' the squared factor loading on the universe score plus the 



sum of squared loading on ttie other centroid factor* plus residual variance. 
.From' a structural perspective this formulation is problematical because: ^ 



(a) The first factor may not be the factor of interest, e.g., 

"methods" factors (Campbell & Fiske, 1959) frequently account for larger 
proportions of observed variance than Vtrue/V "trait," or "Universe" factors., 
■(b) In reality there may be several underlying "true" ''factors and/or "other" 
factors, which may 1 ; be oblique. . 
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